Donation?

Harley Hahn
Home Page

Send a Message
to Harley


A Personal Note
from Harley Hahn

Unix Book
Home Page

SEARCH

List of Chapters

Table of Contents

List of Figures

Chapters...
   1   2   3
   4   5   6
   7   8   9
  10  11  12
  13  14  15
  16  17  18
  19  20  21
  22  23  24
  25  26

Glossary

Appendixes...
  A  B  C
  D  E  F
  G  H

Command
Summary...

• Alphabetical
• By category

Unix-Linux
Timeline

Internet
Resources

Errors and
Corrections

Endorsements


INSTRUCTOR
AND STUDENT
MATERIAL...

Home Page
& Overview

Exercises
& Answers

The Unix Model
Curriculum &
Course Outlines

PowerPoint Files
for Teachers

Chapter 26...

Processes and Job Control

Within Unix, every object is represented by either a file or a process. In simple terms, a file is an input source or output target, while a process is a program that is executing. Files offer access to data; processes make things happen.

It is very important that you have a firm understanding of both files and processes. We discussed files in detail in Chapters 23, 24 and 25. In this chapter, we will cover processes and the related topic of job control. To do so, we will consider several key questions. Where do processes come from? How are they managed by the system? How do you control your own processes?

As you read this chapter, much of what we have discussed throughout the book will come together in a way that will bring you a great deal of satisfaction. Once you understand processes and how they are managed, you will appreciate the richness of Unix, and how its various parts interact to form a complex, elegant system.

Jump to top of page

How the Kernel Manages Processes

In Chapter 6, we discussed the idea of a process, a program that is executing. More precisely, a PROCESS is a program that is loaded into memory and ready to run, along with the program's data and the information needed to keep track of the program. All processes are managed by the kernel, the central part of the operating system. The details, as you can imagine, are complex, so let me offer you a summary.

When a process is created, the kernel assigns it a unique identification number called a PROCESS ID or PID (pronounced as three separate letters, "P-I-D"). To keep track of all the processes in the system, the kernel maintains a PROCESS TABLE, indexed by PID, containing one entry for each process. Along with the PID, each entry in the table contains the information necessary to describe and manage the process.

Does this arrangement sound familiar? It should, because it is similar to the system of inumbers and inodes we discussed in Chapter 25. As you will remember, every file has a unique identification number called its inumber, which is used as an index into the inode table. Each inode contains the information necessary to describe and manage a particular file. Thus, the process table is similar to the inode table. Analogously, a process ID corresponds to an inumber, while an entry in the process table corresponds to an inode.

A small Unix system can easily have over 100 processes running at the same time. Some of them, of course, are programs run by the users. Most processes, however, are started automatically to perform tasks in the background. On a large system, there can be hundreds or even thousands of processes all needing to share the system's resources: processors, memory, I/O devices, network connections, and so on. In order to manage such a complex workload, the kernel provides a sophisticated scheduling service, sometimes referred to as the SCHEDULER.

At all times, the scheduler maintains a list of all the processes waiting to execute. Using a complicated algorithm, the scheduler chooses one process at a time, and gives it a chance to run for a short interval called a TIME SLICE. (On a multiprocessor system, the scheduler will choose more than one process at a time.)

When we talk about concepts such as time slices, we often refer to processing time as CPU TIME. This term dates back to the olden days — before modern single-chip processors — when the bulk of the computation was performed by a central processing unit or CPU.

A typical time slice consists of about 10 milliseconds (10 thousandths of a second) of CPU time. Once the time slice is over, the process goes back on the scheduling list and another process is started. In this way, every process, eventually, is given enough CPU time to complete its work. Although a time slice isn't very long by human standards, modern processors are very, very fast and 10 milliseconds is actually long enough to execute tens of thousands of instructions. (Think about that for a moment.)

Each time a process finishes its time slice, the kernel needs to put the process on hold. However, this must be done in such a way that, later, when the process is restarted, it is able to continue exactly where it left off. To make this possible, the kernel saves data for every process that is interrupted. For example, the kernel will save the location of the next instruction to be executed within the program, a copy of the environment, and so on.

Jump to top of page

Forking Till You Die

So how are processes created? With one notable exception (discussed later in the chapter), every process is created by another process. Here is how the system works.

As we discussed in Chapter 2, the kernel is the core of the operating system. As such, the kernel provides essential services to processes, specifically:

• Memory management (virtual memory management, including paging)

• Process management (process creation, termination, scheduling)

• Interprocess communication (local, network)

• Input/output (via device drivers, programs that perform the actual communications with physical devices)

• File management

• Security and access control

• Network access (such as TCP/IP)

When a process needs the kernel to perform a service, it sends the request by using a SYSTEM CALL. For example, a process would use a system call to initiate an I/O operation. When you write programs, the exact method of using a system call depends on your programming language. In a C program, for example, you would use a function from a standard library. Unix systems typically have between 200-300 system calls, and part of becoming a programmer is learning how to use them, at least the most important ones.

The most important system calls are the ones used for process control and I/O (see Figure 26-1). Specifically, the system calls used to create and use processes are fork, exec, wait, and exit.

Figure 26-1: Commonly used system calls

Many important tasks can be performed only by the kernel. When a process needs to perform one of these tasks, it must use a system call to send a request to the kernel to do the job. Unix/Linux systems generally have 200-300 different system calls. The most commonly used system calls are the ones used for process control (fork, wait, exec, exit and kill), and file I/O (open, read, write and close).

System Call Purpose
forkcreate a copy of the current process
waitwait for another process to finish executing
execexecute a new program within the current process
exitterminate the current process
killsend a signal to another process
openopen a file for reading or writing
readread data from a file
writewrite data to a file
closeclose a file

The fork system call creates a copy of the current process. Once this happens, we call the original process the PARENT PROCESS or, more simply, the PARENT. The new process, which is an exact copy of the parent, is called the CHILD PROCESS or the CHILD. The wait system call forces a process to pause until another process has finished executing. The exec system call changes the program that a process is running. And, finally, the exit system call terminates a process. To make it easy to talk about these concepts, we often use the words FORK, EXEC, WAIT and EXIT as verbs. For example, you might read, "When a process forks, it results in two identical processes."

What is amazing is that, by using only these four basic system calls (with a few minor variations we can ignore), Unix processes are able to coordinate the elaborate interaction that takes place between you, the shell, and the programs you choose to run. To illustrate how it works, let's consider what happens when you enter a command at the shell prompt.

As you know (from Chapter 11), the shell is a program that acts as a user interface and script interpreter. It is the shell that enables you to enter commands and, indirectly, to access the services of the kernel. Although the shell is important, once it is running it is just another process, one of many in the system. Like all processes, the shell has its own PID (process ID) and its own entry in the process table. In fact, at any time, you can display the PID of the current shell by displaying the value of a special shell variable with the odd name of $ (dollar sign):

echo $$

(See Chapter 12 for a discussion of shell variables.)

As we discussed in Chapter 13, there are two types of commands: internal and external. Internal or builtin commands are interpreted directly by the shell, so there is no need to create a new process. External commands, however, require the shell to run a separate program. As such, whenever you want to run an external command, the shell must create a new process. Here is how it works.

The first thing the shell does is use the fork system call to create a brand new process. The original process becomes the parent, and the new process is the child. As soon as the forking is done, two things happen. First, the child uses the exec system call to change itself from a process running the shell into a process running the external program. Second, the parent uses the wait system call to pause itself until the child is finished executing.

Eventually, the external program finishes, at which time the child process uses the exit system call to stop itself. Whenever a process stops permanently, for whatever reason, we say that the process DIES or TERMINATES. In fact, as you will see later in the chapter, when we stop a process deliberately, we say that we "kill" it.

Whenever a process dies, all the resources it was using — memory, files, and so on — are deallocated, so they can be used by other processes. At this point, the defunct process is referred to as a ZOMBIE. Although a zombie is dead and is no longer a real process, it still retains its entry in the process table. This is because the entry contains data about the recently departed child that may be of interest to the parent.

Immediately after a child turns into a zombie, the parent — which has been waiting patiently for that child to die — is woken up by the kernel. The parent now has an opportunity to look at the zombie's entry in the process table to see how things turned out. The kernel then removes the entry from that table, effectively extinguishing the last remnant of the child's short, but useful life.

To illustrate the procedure, let's consider what happens when you enter a command to run the vi text editor. The first thing the shell does is fork to create a child process, identical to itself. It then begins waiting for the child to die.(*) At the same instant, the child uses exec to change from a process running the shell to a process running vi. What you notice is that, an instant after you enter the vi command, the shell prompt is replaced by the vi program.

* Footnote

Unix programming is not for the faint of heart.

When you finish working with vi you quit the program. This kills the child process that has been running vi and turns it into a zombie. The death of the child causes the kernel to wake up the parent. This, in turn, causes the zombie to be removed from the process table. At the same time, the original process returns to where it was. What you notice is that, an instant after you stop the vi program, you see a new shell prompt.

Jump to top of page

Orphans and Abandoned Processes

You might ask, what if a parent forks and then dies unexpectedly, leaving the child all alone? The child, of course, keeps executing but it is now considered to be an ORPHAN. An orphan can still do its job, but when it dies there is no parent to wake up. As a result, the dead child — now in the form of a zombie — is stuck in limbo.

In the olden days, an orphaned zombie would stay in the process table forever or until the system was rebooted (whichever came first). On modern Unix systems, orphaned processes are automatically adopted by process #1, the init process (discussed later in the chapter). In this way, whenever an orphan dies, init, acting in loco parentis is able to swoop down and — without a trace of hesitation — initiate the steps that will lead to the destruction of the the zombie.

A similar situation arises when a parent creates a child, but does not have the good manners to wait for the child to die. Later, when the child dies and turns into a zombie, the neglectful parent has left the poor zombie — like Mariana in the Moated Grange(*) — saying to itself, "My life is dreary, he cometh not, I am aweary, aweary, I would that I were dead!"

* Footnote

See the poem "Mariana in the Moated Grange" by Alfred Tennyson.

Fortunately, this is an uncommon event. In fact, such occurrences generally happen only when a program has a bug that allows the program to create a child without waiting for the child to die. Interestingly enough, if one of your own programs inadvertently creates an immortal zombie in this manner, there is no direct way for you to get rid of it. After all, how can you kill something that is already dead?

To get rid of an abandoned child that has become a zombie, you can use the kill program (described later in the chapter) to terminate the parent. Once the parent dies, the zombie becomes an orphan, which will automatically be adopted by the init process. In due course, init will fulfil its destiny as a responsible step-parent by driving the final stake through the heart of the zombie.

Is Unix programming cool or what?

Jump to top of page

Distinguishing Between Parent and Child

Earlier in the chapter, I explained that the shell executes an external command by forking to create a child process. The child process then execs to run the command, while the parent waits for the child to terminate.

As we discussed, forking results in two identical processes: the original (the parent) and the copy (the child). But one process has to wait and the other has to run a program. If the parent and child are identical, how does the parent know it's the parent, and how does the child know it's the child? In other words, how do they each know what to do?

The answer is when the fork system call has finished its work, it passes a numeric value, called a RETURN VALUE, to both the parent and the child process. The return value for the child is set to 0 (zero). The return value for the parent is set to the process ID of the newly created child. Thus, after a fork operation is complete, a process can tell if it is the parent or the child simply by checking the return value. If the return value is greater than zero, the process knows it is the parent. If the return value is 0, the process knows it is the child.

So what happens when you run an external command? After the shell forks, there are two identical shells. One is the parent; the other is the child but, at first, they don't know which is which. To figure it out, each process checks the return value it received from fork. The shell with the positive return code knows it is the parent, so it uses the wait system call to pause itself. The shell with the zero return value knows it is the child, so it uses the exec system call to run the external program. (Like all tricks, it doesn't look like magic once you understand how it is done.)

Jump to top of page

The Very First Process: init

In the next section, we will begin to take a look at the day-to-day programs and techniques you will use to control your processes. Before we do, however, I want to digress for a moment to discuss a very interesting observation. If processes are created by forking, every child process must have a parent. But then, that parent must have a parent of its own, and so on. Indeed, if you trace the generations back far enough, you come to the conclusion that, somewhere, there must have been a very first process.

That conclusion is correct. Every Unix system has a process that — at least indirectly — is the parent of all the other processes in the system. The details can vary from one type of Unix to another, but all I want is for you to understand the general idea. I'll describe how it works with Linux.

In Chapter 2, we talked about the boot procedure, the complex set of steps that starts the operating system. Towards the end of the boot procedure, the kernel creates a special process "by hand", that is, without forking. This process is given a PID (process ID) of 0. For reasons I will explain in a moment, process #0 is referred to as the IDLE PROCESS.

After performing some important functions — such as initializing the data structures needed by the kernel — the idle process forks, creating process #1. The idle process then execs to run a very simple program that is essentially an infinite loop doing nothing. (Hence, the name "idle process".) The idea is that, whenever there are no processes waiting to execute, the scheduler will run the idle process. By the time process #0 has metamorphosized into the idle process, it has served its purpose and, effectively, has vanished. Indeed, if you use the ps command (discussed later) to display the status of process #0, the kernel will deny that the process even exists.

But what of process #1? It carries out the rest of the steps that are necessary to set up the kernel and finish the boot procedure. For this reason, it is called the INIT PROCESS, and the actual program itself is named init. Specifically, the init process opens the system console (see Chapter 3) and mounts the root filesystem (see Chapter 23). It then runs the shell script contained in the file /etc/inittab. In doing so, init forks multiple times to create the basic processes necessary for running the system (such as setting the runlevel; see Chapter 6) and enabling users to login. In doing so, init becomes the ancestor of all the other processes in the system.(*)

* Footnote

Process #0, the idle process, forks to create process #1, the init process. Thus, strictly speaking, the ultimate ancestor of all processes is actually process #0. However, once process #0 finishes its job, it effectively disappears. Thus, we can say that process #1 is the only living ancestor of all the processes in the system.

In fact, if a process were ever to become interested in genealogy, it would not be able to trace its roots past process #1, because processes are not allowed to read the source code for the kernel.

Unlike the idle process (#0), the init process (#1) never stops running. In fact, it is the first process in the process table, and it stays there until the system is shut down. Even after the system has booted, init is still called upon to perform important actions from time to time. For example, as we have already discussed, when a parent process dies before its child, the child becomes an orphan. The init program automatically adopts all orphans to make sure their deaths are handled properly.

Later in the chapter, when we discuss the ps (process status) command, I will show you how to display the process IDs of both a process and its parents. You will see that if you trace the ancestry of any process in the system far enough, it will always lead you back to process #1.

Jump to top of page

Foreground and Background Processes

When you run a program, the input and output is usually connected to your terminal. For text-based programs, input comes from your keyboard, and output goes to your monitor. This only make sense because most of the programs you will use need to interact with you in order to do their job.

Some programs, however, can run by themselves without tying up your terminal. For example, let's say you want to use a program to read a very large amount of data from a file, sort that data, and then write the output to another file. There is no reason why such a program can't work on its own without your intervention.

As we have discussed, whenever you enter a command to run a program, the shell waits for the program to finish before asking you to enter another command. However, if you were using the sorting program I described above, there would be no need to wait for it to finish. You could enter the command to start the program and then move right along to the next command, leaving the program to run on its own.

To do this, all you have to do is type an & (ampersand) character at the end of the command. This tells the shell that the program you are running should execute all by itself. For example, say the command to run the sorting program is as follows:

sort < bigfile > results

If you enter this exact command, the shell will launch the program and wait until the program finishes executing. It is only when the program is done that the shell will display a prompt to tell you that it is waiting for a new command. However, it works differently when you enter the command with an & at the end:

sort < bigfile > results &

In this case, the shell does not wait for the program to finish. As soon as the program starts, the shell regains control and displays a new prompt. This means you can enter another command without waiting for the first program to finish.

When the shell waits for a program to finish before prompting you to enter a new command, we say the process is running in the FOREGROUND. When the shell starts a program, but then leaves it to run on its own, we say the process is running in the BACKGROUND. In the first example, we ran the sort program in the foreground. In the second example, we ran sort in the background by typing an & character at the end of the command.

As I explained in Chapter 15, most Unix programs are designed to read input from standard input (stdin) and write output to standard output (stdout). Error messages are written to standard error (stderr). When you run a program from the shell prompt, stdin is connected to your keyboard, and stdout and stderr are connected to your monitor. If you want to change this, you can redirect stdin, stout and stderr at the time you run the program.

Reading from the keyboard and writing to the monitor works fine when you run a process in the foreground. However, when you run a program in the background, the process executes on its own to allow you to enter another command. What happens, then, if a background process tries to read or write to standard I/O? The answer is the input is disconnected, but the output connections do not change.

This has two important implications. First, if a process running in the background tries to read from stdin, there will be nothing there and the process will pause indefinitely, waiting for input. The process wants to read, and it is going to wait and wait and wait until you give it something to read. In such cases, the only thing you can do is use move the process to the foreground (discussed later). This allows you to interact with the process and give it what it wants.

Second, if a process running in the background writes to either stdout or stderr, the output will show up on your monitor. However, since you are probably working on something else, the output will be mixed in with whatever else you are doing, which can be confusing and distracting.

Jump to top of page

Creating a Delay: sleep

In order to demonstrate how background output can get mixed up with foreground output, I have a short experiment for you. In this experiment, we are going to run a sequence of two commands in the background. The first command will create a delay; the second command will then write some output to the terminal. In the meantime (during the delay), we will start another program in the foreground. You will then see what happens when a background process writes its output to the screen while you are working in the foreground.

Before we get started, I want to tell you about the tool we will use to create the delay. We will be using a program named sleep. The syntax is:

sleep interval[s|m|h|d]

where interval is the length of the delay.

Using sleep is straightforward. Just specify the length of the delay you want in seconds. For example, to pause for 5 seconds, use:

sleep 5

If you enter this command at your terminal, it will seem as if nothing is happening. However, 5 seconds later, the program will finish and you will see a new shell prompt.

With Linux or any other system that uses the GNU utilities (see Chapter 2), you can specify a one-letter modifier after the interval: s for seconds (the default), m for minutes; h for hours, and d for days. For example:

sleep 5
sleep 5s
sleep 5m
sleep 5h
sleep 5d

The first two commands pause for 5 seconds. The next three commands pause for 5 minutes, 5 hours, and 5 days respectively.

Most often, we use sleep within a shell script to create a specific delay. For example, let's say Program A writes data to a file that is needed by Program B. You may need to write a shell script to make sure the data file exists before you run Program B. Within the script, you use sleep to create, say, a 5 minute delay within a loop. Every 5 minutes, your script checks to see if the file exists. If not, it waits for another 5 minutes and tries again. Eventually, when the file is detected, the script moves on and runs Program B.

At the command line, sleep is useful when you want to wait a specified amount of time before running a command, which is what we will be doing. To run the experiment, I want you to enter the following two command lines quickly, one right after the other:

(sleep 20; cat /etc/passwd) &
vi /etc/termcap

The first command runs in the background. It pauses for 20 seconds and then copies the contents of the password file (Chapter 11) to your terminal. The second command runs in the foreground. It uses the vi text editor (Chapter 22) to look at the Termcap file (Chapter 7).

After you enter the second command and vi has started, wait a short time and you will see the contents of the password file splattered all over your screen. Now you can appreciate how irritating it is when a background process writes its output to your terminal when you are working on something else.

The moral? Don't run programs in the background if they are going to read or write from the terminal.

(Within vi: To redraw the screen, press ^L. This is a handy command to remember for just such occasions. To quit, type :q and then press <Return>.)

A program is a candidate to run as a background process only if it does not need to run interactively; that is, if it does not need to read from your keyboard or write to your screen. Consider our earlier example:

sort < bigfile > results &

In this case, we can run the program in the background because it gets its input from one file (bigfile) and writes its output to another file (results).

Interestingly enough, the shell will allow you to run any program in the background; all you have to do is put an & character at the end of the command. So be thoughtful. Don't, for instance, run vi, less or other such programs in the background.

— hint —

If you accidentally run an interactive program in the background, you can terminate it by using the kill command, discussed later in the chapter.

— technical hint —

Compiling a source program is a great activity to run in the background. For example, let's say you are using the gcc compiler to compile a C program named myprog.c. Just make sure to redirect the standard error to a file, and everything will work fine. The first command below does the job for the Bourne shell family (Bash, Korn Shell). The second command is for the C-Shell family (Tcsh, C-Shell):

gcc myprog.c 2> errors &
gcc myprog.c >& errors &

Another common situation occurs when you need to build a program that uses a makefile. For example, say you have downloaded a program named game. After unpacking the files, you can use make to build the program in the background:

make game > makeoutput 2> makeerrors &

In both cases, the shell will display a message for you when the program has finished.

Jump to top of page

Job Control

In the early 1970s, the first Unix shells offered very little in the way of process control. When a user ran a program, the resulting process used the terminal for standard input, standard output, and standard error. Until that process finished, the user was unable to enter any more commands. If it became necessary to terminate the process before it finished on its own, the user could press either ^C to send the intr signal or ^\ to send the quit signal (see Chapter 7). The only difference was that quit would generate a core dump for debugging.

Alternatively, a user could type an & (ampersand) character at the end of the command line to run the program as an ASYNCHRONOUS PROCESS. An asynchronous process had two defining characteristics. First, by default, standard input would be connected to the empty file /dev/null. Second, because the process was running on its own without any input from the user, it would not respond to the intr or quit signals.

Today, we have GUIs, terminal windows, and virtual consoles, which make it easy to run more than one program at the same time. In the 1970s, however, being able to create asynchronous processes was very important, as it enabled users to start programs that would run by themselves without tying up the terminal. For example, if you had a long source program to compile, you could use an asynchronous process to do the job. Once the process started, your terminal would be free, so you wouldn't have to stop working. Of course, if the asynchronous process got into trouble, you would not be able to terminate it with ^C or ^/. Instead, you would have to use the kill command (covered later in the chapter).

As we discussed in Chapter 11, the original Bourne Shell was created in 1976 by Steven Bourne at Bell Labs. This shell was part of AT&T Unix and it supported asynchronous processes, which is all there was until 1978 when Bill Joy, a graduate student at U.C. Berkeley, created a brand new shell, which he called the C-Shell (see Chapter 11). As part of the C-Shell, Joy included support for a new capability called JOB CONTROL. (Joy also added several other important new features, such as aliases and command history.)

Job control made it possible to run multiple processes: one in the foreground, the rest in the background. Within the C-Shell, a user could pause any process and restart it as needed. He could also move processes between the foreground and background, suspend (pause) them, and display their status. Joy included the C-Shell in BSD (Berkeley Unix), and job control proved to be one of the shell's most popular features. Even so, AT&T Unix did not get job control for four more years, until David Korn included it in the first Korn Shell in 1982. Today, job control is supported by every important Unix shell.

The essential feature of job control is that every command you enter is considered to be a JOB identified by a unique JOB NUMBER also referred to as a JOB ID (pronounced "job-I-D"). To control and manipulate your jobs, you use the job id along with a variety of commands, variables, terminal settings, shell variables, and shell options. For reference, these tools are summarized in Figure 26-2.

Figure 26-2: Job control: Tools

Job control is a feature supported by the shell that enables you to run multiple jobs, one in the foreground, the rest in the background. You can selectively suspend (pause) jobs, restart them, move them between the foreground and background, and display their status. To do so, you use a variety of commands, variables, terminal settings, shell variables, and shell options.

Job Control Commands
jobsdisplay list of jobs
psdisplay list of processes
fgmove job to foreground
bgmove job to background
suspendsuspend the current shell
^Zsuspend the current foreground job
killsend signal to job; by default, terminate job
Variables
echo $$display PID of current shell
echo $!display PID of the last command you moved to the background
Terminal Settings
stty tostopsuspend background jobs that try to write to the terminal
stty -tostopturn off tostop
Shell Options: Bash, Korn Shell
set -o monitorenable job control
set +o nomonitorturn off monitor
set -o notifynotify immediately when background jobs finish
set +o nonotifyturn off notify
Shell Variables: Tcsh, C-Shell
set listjobslist all jobs whenever a job is suspended (Tcsh only)
set listjobs longlistjobs with a long listing (Tcsh only)
set notifynotify immediately when background jobs finish
set nonotifyturn off notify

Within the Bourne Shell family (Bash, Korn Shell), job control is enabled when the monitor option is set. This is the default for interactive shells, but it can be turned off by unsetting the option (see Chapter 12). With the C-Shell family (Tcsh, C-Shell), job control is always turned on for interactive shells.

It is natural to wonder, how is a job different from a process? For practical purposes, the two concepts are similar, and you will often see people use the terms "job" and "process" interchangeably. Strictly speaking, however, there is a difference. A process is a program that is executing or ready to execute. A job refers to all the processes that are necessary to interpret an entire command line. Where processes are controlled by the kernel, jobs are controlled by the shell, and in the same way that the kernel uses the process table to keep track of processes, the shell uses a JOB TABLE to keep track of jobs.

As an example, let's say you enter the following simple command to display the time and date:

date

This command generates a single process, with its own process ID, and a single job with its own job ID. As the job runs, there will be one new entry in the process table and one new entry in the jobs table. Now consider the following more complicated command lines. The first uses a pipeline consisting of four different programs. The second executes four different programs in sequence:

who | cut -c 1-8 | sort | uniq -c
date; who; uptime; cal 12 2008

Each of these command lines generates four different processes, one for each program, and each process has its own process ID. However, the entire pipeline — no matter how many processes it might require — is considered to be a single job, with a single job ID. While the job is running, there will be four entries in the process table, but only a single entry in the job table.

At any time, you can display a list of all your processes by using the ps (process status) command. Similarly, you can display a list of your jobs by using the jobs command. We'll discuss the details later in the chapter.

Jump to top of page

Running a Job in the Background

To run a job in the background, you type an & character at the end of the command. For example, the following command runs the ls program in the background, with the output redirected to a file named temp:

ls > temp &

Each time you run a job in the background, the shell displays the job number and the process ID. The shell assigns the job numbers itself, starting from 1. For example, if you create 4 jobs, they will be assigned job numbers 1, 2, 3 and 4. The kernel assigns the process ID, which is, in most cases, a multi-digit number.

As an example, let's say you entered the command above. The shell displays the following:

[1] 4003

This means that job number #1 has just been started, with a process ID of 4003. If your job consists of a pipeline with more than one program, the process ID you see will be that of the last program in the pipeline. For example, let's say you enter:

who | cut -c 1-8 | sort | uniq -c &

The shell displays the following:

[2] 4354

This tells you that you have started job #2 and that the process id of the last program (uniq) is 4354.

Since background jobs run by themselves, there is no easy way for you to keep track of their progress. For this reason, the shell sends you a short status message whenever a background job finishes. For example, when the job in our first example finishes, the shell will display a message similar to the following:

[1] Done    ls > temp

This message notifies you that job #1 has just finished.

If you are waiting for a particular background job to finish, such notifications are important. However, it would be irritating if the shell displayed status messages willy-nilly when you were in the middle of doing something else, such as editing a file or reading a man page. For this reason, when a background job ends, the shell does not notify you immediately. Instead, it waits until it is time to display the next shell prompt. This prevents the status message from interfering with the output from another program.

If you do not want to wait for such messages, there is a setting you can change to force the shell to notify you the instant a background job finishes, regardless of what else you might be doing. With the Bourne shell family (Bash, Korn Shell), you set the notify option:

set -o notify

To unset this option, use:

set +o notify

With the C-Shell family (Tcsh, C-Shell), you set the notify variable:

set notify

To unset this variable, use:

unset notify

For a discussion of how to use shell options and shell variables, see Chapter 12. If you want to make the setting permanent, just place the appropriate command in your environment file (Chapter 14).

Jump to top of page

Suspending a Job: fg

At any time, every job is in one of three STATES: running in the foreground; running in the background; or paused, waiting for a signal to resume execution. To pause a foreground job, you press ^Z (Ctrl-Z). As described in Chapter 7, this sends the susp signal, which causes the process to pause. When you pause a process in this way, we say that you SUSPEND it or STOP it.

The terminology can be a bit misleading, so let's take a moment to discuss it. The term "stop" refers to a temporary pause. Indeed, as you will see, a stopped job can be restarted. Thus, when you press ^Z it merely pauses the job. If you want to halt a process permanently, you must press ^C or use the kill command (both of which are discussed later in the chapter).

When you stop a program, the shell puts it on hold and displays a new shell prompt. You can now enter as many commands as you like. When you want to resume working with the suspended program, you move it back to the foreground by using the fg command. Using ^Z and fg in this way enables you to suspend a program, enter some commands, and then return to the original program whenever you want. Here is a typical example of how you might make use of this facility.

You are using the vi text editor to write a shell script. Within the script, you want to use the cal command to display a calendar, but you are not sure about the syntax. You suspend vi, display the man page for cal, find out what you want, and then return to vi exactly where you left off. Here is how you do it. To start, enter the following command to run vi:

vi script

You are now editing a file named script. Pretend you have typed several lines of the script and you need to find out about the cal program. To suspend vi, press ^Z:

^Z

The shell pauses vi and displays an informative message:

[3]+ Stopped    vi script

In this case, the message tells you that vi, job #3, is now suspended. You are now at the shell prompt. Enter the command to display the man page for cal:

man cal

Look around for a bit, and then press q to quit. You will see the shell prompt. You can now restart vi by moving it back into the foreground:

fg

You are now back in vi, exactly where you left off. (When you want to quit vi, type :q and press <Return>.)

— hint —

If you are working and, all of a sudden, your program stops and you see a message like "Stopped" or "Suspended", it means you have accidentally pressed ^Z.

When this happens, all you have to do is enter fg, and your program will come back to life.

When you suspend a job, the process is paused indefinitely. This can create a problem if you try to log out, because you will have suspended jobs waiting around. The rule is, when you log out, all suspended jobs are terminated automatically. In most cases, this would be a mistake. So if you try to log out and you have suspended jobs, the shell will display a warning message. Here are some examples:

There are suspended jobs.
You have stopped jobs.

If you try to logout and you see such a message, use fg to move the suspended job into the foreground and quit the program properly. If you have more than one suspended job, you must repeat the procedure for each one. This will prevent you from losing data accidentally.

On occasion, you may be completely sure that you want to log out even though you have one or more suspended jobs. If so, all you have to do is try to log out a second time. Since the shell has already warned you once, it will assume you know what you are doing, and you will be allowed to log out without a warning. Remember, though, that logging out in this way will terminate all your suspended jobs; they will not be waiting for you the next time you log in.

— hint for Tcsh users —

When you suspend a job with the Tcsh, the shell displays only a short message "Suspended" with no other information. However, if you set the listjobs variable, the Tcsh will display a list of all your jobs whenever any job is suspended. The command to use is:

set listjobs

If you give listjobs a value of long, the Tcsh will display a "long" listing that also shows each job's process ID:

set listjobs=long

My suggestion is to put this command in your environment file (see Chapter 14) to make the setting permanent.

Jump to top of page

Suspending a Shell: suspend

Pressing ^Z will suspend whichever job is running in the foreground. However, there is one process it will not suspend: your current shell. If you want to pause your shell, you'll need to use the suspend command. The syntax is:

suspend [-f]

Why would you want to suspend a shell? Here is an example. As we discussed in Chapter 4, when you have your own Unix or Linux system, you must do your own system administration. Let's say you are logged in under your own userid, and you need to do something that requires being superuser, so you use su (Chapter 6) to start a new shell in which you are root. After you have been working for a while, you realize that you need to do something quick under your own userid. It would be bothersome to stop the superuser shell and have to restart it later, because you would lose track of your working directory, any variable changes, and so on. Instead, you can enter:

suspend

This pauses the current shell — the one in which you are superuser — and returns you to the previous shell in which you were logged in under your regular userid. When you are ready to go back to being superuser to finish your admin work, you can use the fg command to move your superuser shell back to the foreground:

fg

Here is another example. Let's say you use Bash as your default shell, but you want to experiment with the Tcsh. You enter the following command to start a new shell:

tcsh

At any time, you can use suspend to pause the Tcsh and return to Bash. Later, you can use fg to resume your work with the Tcsh.

The only restriction on suspending shells is that, by default, you are not allowed to suspend your login shell. This prevents you from putting yourself in limbo by stopping your main shell. In certain circumstances, however, you may actually want to pause a login shell. For example, when you start a superuser shell by using su - instead of su, it creates a login shell (see Chapter 6). If you want to suspend the new shell, you must use the -f (force) option:

suspend -f

This tells suspend to pause the current shell, regardless of whether or not it happens to be a login shell.

Jump to top of page

Job Control vs. Multiple Windows

In Chapter 6, we discussed a variety of ways in which you can run more than one program at a time when you are using Unix or Linux on your own computer. First, you can use multiple virtual consoles, each of which supports a completely separate work session. Second, within the GUI, you can open as many terminal windows as you want, each of which has its own CLI (command line interface) with its own shell. Finally, some terminal window programs allow you to have multiple tabs within the same window, with each tab having its own shell.

With so much flexibility, why do you need to be able to suspend processes and run programs in the background? Why not just run every program in its own window and do without job control? There are several important answers to this question.

First, your work will be a lot slower if you need to switch to a different virtual console, window, or tab every time you begin a new task. In many cases, it is a lot less cumbersome to simply pause what you are doing, enter a few commands, and then return to your original task.

Second, when you use multiple windows, you have a lot more visual elements on your screen, which can slow you down. Moreover, windows need to be managed: moved, resized, iconized, maximized, and so on. Using job control makes your life a lot simpler by reducing the mental and visual clutter.

Third, it often happens that the commands you use within a short period of time are related to a specific task or problem. In such cases, it is handy to be able to recall previous commands from your history list (see Chapter 13). When you use separate windows, the history list in one window is not accessible from another window.

Finally, there will be times when you will use a terminal emulator to access a remote host (see Chapter 3), especially if you are a system administrator. In such cases, you will have only a single CLI connected to the remote host. You will not have a GUI with multiple windows or several virtual consoles. If you are not skillful at using job control, you will only be able to run one program at a time, which will be frustrating.

As a general rule, when you need to switch between completely unrelated tasks — especially tasks that require a full screen — it makes sense to use multiple windows or separate virtual consoles. In most other cases, however, you will find that job control works better and faster.

Jump to top of page

Displaying a List of Your Jobs: jobs

At any time, you can display a list of all your jobs by using the jobs command. The syntax is:

jobs [-l]

In most cases, all you need to do is enter the command name by itself:

jobs

Here is some sample output in which you can see three suspended jobs (#1, #3, #4) and one job running in the background (#2):

[1]   Stopped    vim document
[2]   Running    make game >makeoutput 2>makeerrors &
[3]-  Stopped    less /etc/passwd
[4]+  Stopped    man cal

If you would like to see the process ID as well as the job number and command name, use the -l (long listing) option:

jobs -l

For example:

[1]   2288 Stopped    vim document
[2]   2290 Running    make game >makeoutput 2>makeerrors &
[3]-  2291 Stopped    less /etc/passwd
[4]+  2319 Stopped    man cal

Notice that in both listings, one of the jobs is flagged with a + (plus sign) character. This is called the "current job". Another job is flagged with a - (minus sign) character. This is the "previous job".

These designations are used by the various commands that manipulate jobs. If you don't specify a job number, such commands will, by default, act upon the current job. (You will see this when we discuss the fg and bg commands.) In most cases, the CURRENT JOB is the one that was most recently suspended. The PREVIOUS JOB is the next one in line. In our example, the current job is #4 and the previous job is #3.

If there are no suspended jobs, the current job will be the one that was most recently moved to the background. For example, let's say you enter the jobs command and you see the following:

[2]   Running    make game >makeoutput 2>makeerrors &
[6]-  Running    calculate data1 data2 &
[7]+  Running    gcc program.c &

In this case, there are no suspended jobs. However, there are three jobs running in the background. The current job is #7. The previous job is #6.

Jump to top of page

Moving a Job to the Foreground: fg

To move a job to the foreground you use the fg command. There are three variations of the syntax:

fg
fg %[job]
%[job]

where job identifies a particular job.

Although the syntax looks confusing, it's actually quite simple, as you will see. The simplest form of the command is to enter fg by itself:

fg

This tells the shell to restart the current job, the one that is flagged with a + character when you use the jobs command. For example, let's say you use the jobs command and the output is:

[1]   2288 Stopped    vim document
[2]   2290 Running    make game >makeoutput 2>makeerrors &
[3]-  2291 Stopped    less /etc/passwd
[4]+  2319 Stopped    man cal

The current job is #4, which is suspended. If you enter the fg command by itself, it restarts job #4 by moving it to the foreground.

Let's say that, in another situation, you enter the jobs command again and the output is:

[2]   Running    make game >makeoutput 2>makeerrors &
[6]-  Running    calculate data1 data2 &
[7]+  Running    gcc program.c &

In this case, the current job is #7, which is running in the background. If you enter fg by itself, it will move job #7 from the background to the foreground. This allows you to interact with the program.

To move a job that is not the current job, you must identify it explicitly. There are several ways to do so, which are summarized in Figure 26-3.

Figure 26-3: Job control: Specifying a job

To use a job control command, you must specify one or more jobs. You can refer to the jobs in several different ways: as the current job, as the previous job, using a particular job number, or using all or part of the command name.

Job Number Meaning
%%current job
%+current job
%-previous job
%njob #n
%namejob with specified command name
%?namejob with name anywhere within the command

Most of the time, the easiest way to specify a job is to use a % (percent) character, followed by a job number. For example, to move job number 1 into the foreground, you would use:

fg %1

You can also specify a job by referring to the name of the command. For example, if you want to restart the job that was running the command make game, you can use:

fg %make

Actually, you only need to specify enough of the command to distinguish it from all the other jobs. If there are no other commands that begin with the letter "m", you could use:

fg %m

An alternative is to use %? followed by part of the command. For example, another way to move the make game command to the foreground is to use:

fg %?game

As I mentioned, if you use the fg command without specifying a particular job, fg will move the current job into the foreground. (This is the job that is marked with a + character when you use the jobs command.) Alternatively, you can use either % or %+ to refer to the current job. Thus, the following three commands are equivalent:

fg
fg %
fg %+

Similarly, you can use %- to refer to the previous job:

fg %-

This is the job that is marked with a - (minus sign) when you use the jobs command.

— hint —

To switch back and forth between two jobs quickly, use:

fg %-

Once you get used to this command, you will use it a lot.

As a convenience, some shells (Bash, Tcsh, C-Shell) will assume that you are using the fg command if you simply enter a job specification that begins with a % character. For example, let's say that job number 2 is the command vim document and that no other jobs use similar names. All of the following commands will have the same effect:

%2
fg %2
fg %vim
fg %?docu

In each case, the shell will move job number 2 into the foreground.

With some shells, there is one final abbreviation that you can use: a command consisting of nothing but the single character % will tell the shell to move the current job to the foreground. Thus the following four commands are equivalent:

%
fg
fg %
fg %+

Have you noticed something interesting? If you type a job specification all by itself, the shell will assume you want to use the fg command. Thus, fg is the only command in which the command name itself is optional. Remember this interesting bit of trivia; someday it will help you win friends and influence people.

— hint —

Although our examples showed several jobs suspended at the same time, you will usually pause only a single job, do something else, and then return to what you were doing.

In such cases, job control is very simple. To suspend a job, you press ^Z. To restart the job, you enter either fg or (if your shell supports it) %.

Jump to top of page

Moving a Job to the Background: bg

To move a job to the background, you use the bg command. The syntax is:

bg [%job...]

where job identifies a particular job.

To specify a job, you follow the same rules as with the fg command. In particular, you can use the variations in Figure 26-3. For instance, to move job number 2 into the background, you would use:

bg %2

If you'd like, you can move more than one job to the background at the same time, for example:

bg %2 %5 %6

To move the current job into the background, use the command name by itself, without a job specification:

bg

As you might imagine, you will use the fg command more often than the bg command. But there is one important situation when bg comes in handy. Say that you enter a command that seems to be taking a long time. If the program is not interactive, you can suspend it and move it to the background. For example, let's say that you want to use make to build a program named game, so you enter the command:

make game > makeoutput 2> makeerrors

After waiting awhile, you realize this could take a long time. Since make does not need anything from you, there is no point in tying up your terminal. Simply press ^Z to suspend the job; then enter bg to move the job to the background. Your terminal is now free.

— hint —

The bg command is useful when you intend to run a program in the background but forget to type the & character when you entered the command. As a result, the job starts running in the foreground.

Just suspend the job by pressing ^Z, and then use the bg command to move the job into the background.

Jump to top of page

Learning to Use the ps Program

To display information about processes, you use the ps (process status) program. The ps program is a useful tool that can help you find a particular PID (process ID), check on what your processes are doing, and give you an overview of everything that is happening on the system. Unfortunately, ps has so many confusing and obtuse options that the mere reading of the man page is likely to cause permanent damage to your orbitofrontal cortex.

There are several reasons for this situation. First, as we discussed in Chapter 2, in the 1980s there were two principal branches of Unix: the official UNIX (from AT&T) and unofficial BSD (from U.C. Berkeley). UNIX and BSD each had their own version of ps, and each version had its own options. Over time, both types of ps became well-known and widely used.

As a result, many modern versions of ps support both types of options, which we refer to as the UNIX OPTIONS and the BSD OPTIONS. This is the case with Linux, for example. Thus, with the Linux version of ps, you can use either the UNIX or BSD options, whichever you prefer. From time to time, however, you will encounter versions of ps that support only the UNIX options or only the BSD options. Since you never know when you will be called upon to use such a system, you must be familiar with both types of options.

Second, ps is a powerful tool that is used by system administrators and advanced programmers for various types of analysis. As such, there are a lot of technical options that are not really necessary for everyday use. Still, they are available and, when you read the man page, the descriptions can be confusing.

Third, if you have a system that uses the GNU utilities — such as Linux (see Chapter 2) — you will find that ps supports, not only the UNIX options and BSD options, but an extra set of GNU-only options. Most of the time, however, you can ignore these options.

Finally, to add to the confusion, you will sometimes see the UNIX options referred to as POSIX OPTIONS or STANDARD OPTIONS. This is because they are used as the basis for the POSIX version of ps. (POSIX is a large-scale project, started in the 1990s, with the aim of standardizing Unix; see Chapter 11).

By now, it should be clear that, in order to make sense out of all this, we will need a plan, so here it is.

Although ps has many, many options, very few of them are necessary for everyday work. My plan is to teach you the minimum you need to user both the UNIX options and the BSD options. We will ignore all of the esoteric options, including the GNU-only ones. Should you ever need any of the other options, you can, of course, simply check with the ps man page on your system (man ps) to see what is available.

Jump to top of page

The ps Program: Basic Skills

To display information about processes, you use the ps (process status) program. As we just discussed, ps has a great many options that can be divided into three groups: UNIX, BSD and GNU-only. I will teach you how to use the most important UNIX and BSD options, which is all you will normally need.

When it comes to ps options, there is an interesting tradition. The UNIX options are preceded by a dash in the regular manner, but the BSD options do not have a dash. Remember this when you are reading the man page: if an option has a dash, it is a UNIX option; if not, it is a BSD option. I will maintain this tradition within our discussion.

If your version of ps supports both the UNIX and BSD options, you can use whichever ones you prefer. In fact, experienced users will sometimes use one set of options and sometimes use the other, whatever happens to be best for the problem at hand. However, let me give you a warning. Don't mix the two types of options in the same command: it can cause subtle problems.

To start, here is the basic syntax to use ps with UNIX options:

ps [-aefFly] [-p pid] [-u userid]

And here is the syntax to use with BSD options:

ps [ajluvx] [p pid] [U userid]

In both cases, pid is a process ID, and userid is a userid.

Rather than go through each option separately, I have summarized everything you need to know in several tables. Figure 26-4 contains the information you need to use ps with UNIX options. Figure 26-5 shows what you need for BSD options. Take a moment to look through both these figures. At first, it may look a bit confusing, but after you get used to it, everything will make sense.

Figure 26-4: The ps program: UNIX options

The ps (process status) program displays information about the processes that are running on your system. There are two sets of options you can use: UNIX options and BSD options. Here is a summary of the most important UNIX options.

Which processes are displayed?
psprocesses associated with your userid and your terminal
ps -aprocesses associated with any userid and a terminal
ps -eall processes (includes daemons)
ps -p pidprocess with process ID pid
ps -u useridprocesses associated with specified userid
Which data columns are displayed?
psPID TTY TIME CMD
ps -fUID PID PPID C TTY TIME CMD
ps -FUID PID PPID C SZ RSS STIME TTY TIME CMD
ps -lF S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
ps -lyS UID PID PPID C PRI NI RSS SZ WCHAN TTY TIME CMD
Particularly Useful Combinations
psdisplay your own processes
ps -efdisplay all user processes, full output
ps -adisplay all non-daemon processes
ps -t -display all daemons (only)

Figure 26-5: The ps program: BSD options

The ps (process status) program displays information about the processes that are running on your system. There are two sets of options you can use: UNIX options and BSD options. Here is a summary of the most important BSD options.

Which processes are displayed?
psprocesses associated with your userid and your terminal
ps aprocesses associated with any userid and a terminal
ps axall processes (includes daemons)
ps p pidprocess with process ID pid
ps U useridprocesses associated with userid
Which data columns are displayed?
psPID TT STAT TIME COMMAND
ps jUSER PID PPID PGID SESS JOBC STAT TT TIME COMMAND
ps lUID PID PPID CPU PRI NI VSZ RSS WCHAN STAT TT TIME >COMMAND
ps uUSER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
ps vPID STAT TIME SL RE PAGEIN VSZ RSS LIM TSIZ %CPU %MEM COMMAND
Particularly Useful Combinations
psdisplay your own processes
ps axdisplay all processes
ps auxdisplay all processes, full output

Let's say that all you want to see is basic information about all the processes running under your userid from your terminal. In this case, you need only enter the command name by itself:

ps

Here is some typical output using the UNIX version of ps:

  PID  TTY       TIME  CMD
 2262  tty1  00:00:00  bash
11728  tty1  00:00:00  ps

Here is the same output using the BSD version of ps:

  PID  TT  STAT     TIME  COMMAND
50384  p1  Ss    0:00.02  -sh (sh)
72883  p1  R+    0:00.00  ps

In general, ps displays a table in which each row contains information about one process. In the UNIX example above, we see information about two processes, #2262 and #11728. In the BSD example, we see information about processes #50384 and #72883.

Each column of the table contains a specific type of information. There are a variety of different columns you will see depending on which options you use. As a reference, Figure 26-6 shows the most common column headings. Let's use the information in this figure to decode the information in our examples.

Figure 26-6: The ps program: Column headings

The ps (process status) program displays information about processes. The information is organized into columns, each of which has a heading. Because the headings are abbreviations, they can be a bit cryptic.

For reference, here are the column headings you are likely to encounter using the basic options described in Figures 26-4 and 26-5. Most of the time, you can ignore the more esoteric columns. Still, I have explained them all in case you are curious. As you can see, the headings used by the UNIX options differ from those used by the BSD options. For the meaning of the state codes, see Figure 26-7.

UNIX Headings Meaning
ADDRvirtual address within process table
Cprocessor utilization (obsolete)
CMDname of command being executed
Fflags associated with the process
NInice number, for setting priority
PIDprocess ID
PPIDparent's process ID
PRIpriority (higher number = lower priority)
RSSresident set size (memory management)
Sstate code (D,R,S,T,Z)
STIMEcumulative system time
SZsize in physical pages (memory management)
TIMEcumulative CPU time
TTYfull name of controlling terminal
UIDuserid
WCHANwait channel
  
BSD Headings Meaning
%CPUpercentage of CPU (processor) usage
%MEMpercentage of real memory usage
CMDname of command being executed
COMMANDfull command being executed
CPUshort-term CPU usage (scheduling)
JOBCjob control count
LIMmemory-use limit
NInice number, for setting priority
PAGEINtotal page faults (memory management)
PGIDprocess group number
PIDprocess ID
PPIDparent's process ID
PRIscheduling priority
REmemory residency time in seconds
RSSresident set size (memory management)
SESSsession pointer
SLsleep time in seconds
STARTEDtime started
STATstate code (O,R,S,T,Z)
TIMEcumulative CPU time
TSIZtext size in kilobytes
TTabbreviated name of controlling terminal
TTYfull name of controlling terminal
UIDuserid
USERuser name
VSZvirtual size in kilobytes
WCHANwait channel

Starting with the UNIX example, we see there are four columns labeled PID, TTY, TIME and CMD. Looking up these names in Figure 26-6, we see the following:

PID: process ID
TTY: name of controlling terminal
TIME: cumulative CPU time
CMD: name of command being executed

Thus, we can see that process #2262 is controlled by terminal tty1, has taken virtually no CPU time, and is running Bash. The information for process #11728 is pretty much the same. The only difference is that this process is running the ps command. What you see in this example is the minimum you see, because there are always at least two processes: your shell and the ps program itself. The ps process does not live long, however. In fact, it dies as soon as its output is displayed.

Now let us analyze the BSD example the same way. Looking at the output, we see five columns labeled PID, TT, STAT, TIME and COMMAND. Checking with Figure 26-6, we see the following:

PID: process ID
TT: name of controlling terminal
STAT: state code (O,R,S,T,Z)
TIME: cumulative CPU time
COMMAND: full command being executed

In general, the output of the BSD version of ps is straightforward, except for the STAT column, which we will get to in a moment.

Before we leave this section, I want to show you a small but interesting variation: when you use BSD options, ps displays abbreviated terminal names. Take a moment to look carefully at the TT column in the example above. Notice that you see only two characters, in this case p1. The full name of this terminal is actually ttyp1. (Terminal names are discussed in Chapter 23.)

— hint —

It is interesting to draw an analogy between ls and ps. Both of these programs examine specific data structures to find and display information for you.

The ls program (Chapter 25) examines the inode table, which is indexed by inumber, to display information about files. The ps program examines the process table, which is indexed by process ID, to display information about processes.

Jump to top of page

The ps Program: Choosing Options

The best approach to using ps is to start by asking two questions: Which processes am I interested in? What information do I want to see about each process? Once you decide what you need, all you need to do is use Figure 26-4 (for UNIX) or Figure 26-5 (for BSD) to choose the appropriate options.

For example, let's say you want to see the process ID of every process running on the system, as well as the process ID of all the parents. Let's do UNIX first. To start, we ask ourselves, which option will display all the processes on the system? From Figure 26-4, we see that this is the -e (everything) option.

Next, we must find the option that will display the process ID for each process and its parent. Checking with Figure 26-6, we see that the column headings we want are PID and PPID. Going back to Figure 26-4, we look for options that will show these two headings. All four choices will do the job, so let's use -f (full output) because it displays the least amount of output.

Putting it all together, we have figured out how to display the process ID for every process in the system as well as all the parents:

ps -ef

Most likely, this command will generate a lot of lines, so it is a good idea to pipe the output to less (Chapter 21) to display one screenful at a time:

ps -ef | less

Now let's do the same analysis for the BSD version of ps. To start, we look at Figure 26-5 to see which option will display all the processes in the system. The answer is ax. Next, we look for the options that will display the parent's process ID. We have two choices, j and l. Let's choose j because it generates less output. Thus, the BSD version of the command we want is:

ps ajx | less

As an exercise, let's see what it takes to trace the parentage of one of our processes as far back as possible. To start, we will use a UNIX version of ps to display our current processes:

ps

The output is:

  PID TTY         TIME CMD
12175 tty2    00:00:00 bash
12218 tty2    00:00:00 ps

Our goal is to trace the parentage of the shell, process #12175. To start, we ask the question, what option will display information about one specific process? Looking at Figure 26-4, we see that we can use -p followed by a process ID. Next we ask, which option will display the parent's process ID? The answer is -f. Thus, to start our search, we use the command:

ps -f -p 12175

The output is:

UID       PID  PPID  C  STIME  TTY      TIME  CMD
harley  12175  1879  0  14:14  tty1 00:00:00  -bash

From this we can see that the parent of process #12175 is process #1879. Let us repeat the same command with the new process ID:

ps -f -p 1879

The output is:

UID      PID  PPID  C  STIME  TTY     TIME  CMD
root    1879     1  0  09:36  ?   00:00:00  login -- harley

Notice that the parent of process #1879 is process #1. This is the init process we discussed earlier in the chapter.

Before we move on, there are two interesting points I wish to draw to your attention. First, notice the ? character in the TTY column. This indicates that the process does not have a controlling terminal. We call such processes "daemons", and we will talk about them later in the chapter. Second, we see that process #1879 is running the login program under the auspices of userid root. This is because login is the program that enables users to log in to the system. You may remember that, in Chapter 4, we used the same program to log out and leave the terminal ready for a new user.

To finish our search for the ultimate parent, let us display information about process #1:

ps -f -p 1

The output is:

UID      PID  PPID  C  STIME  TTY     TIME  CMD
root       1     0  0  09:34  ?   00:00:01  init [5]

We have reached the end of our genealogical journey. As we discussed earlier in the chapter, the parent of process #1 (the init process) is process #0 (the idle process). Notice, by the way, that process #1 ran the init command to boot the system into runlevel 5 (multiuser mode with a GUI). We discuss runlevels in Chapter 6.

Jump to top of page

The ps Program: States

Let us now conclude our discussion of the ps command by talking about states. As we discussed earlier in the chapter, processes are generally in one of three states: running in the foreground; running in the background; or suspended, waiting for a signal to resume execution. There are also other less common variations, such as the zombie state, when a process has died but its parent is not waiting for it.

To look at the state of a process, you use ps to display the S column (with UNIX options) or the STAT column (with BSD options). Let's start with the UNIX version. Checking with Figure 26-4, we see that the UNIX options that display the S column are -l and -ly. We'll use -ly because it displays less output. Thus, to display a list of all your processes including their state, you would use:

ps -ly

Here is some typical output from a Linux system:

S UID  PID PPID C PRI NI RSS  SZ WCHAN  TTY      TIME CMD
S 500 8175 1879 0  75  0 464 112 wait   tty1 00:00:00 bash
T 500 8885 8175 0  75  0 996 366 finish tty1 00:00:00 vim
R 500 9067 8175 2  78  0 996 077 -      tty1 00:00:02 find
R 500 9069 8175 0  78  0 800 034 -      tty1 00:00:00 ps

The state is described by the one-letter code in the S column. The meanings of the codes are explained in Figure 26-7. In this case, we can see that the first process in the list, process #8175 (the shell), has a state code of S. This means it is waiting for something to finish. (In particular, it is waiting for the child process #90682, the ps program itself.)

Figure 26-7: The ps program: Process state codes

With certain options, the ps command displays a column of data indicating the state of each process. With the UNIX options, the column is labeled S, and contains a single-character code. With the BSD options, the column is labeled STAT and contains a similar code, sometimes followed by 1-3 other, less important characters. Here are the meanings of the codes, which vary slightly from one system to another.

Linux, FreeBSD
DUninterruptible sleep: waiting for an event to complete (usually I/O; D="disk")
IIdle: sleeping for longer than 20 seconds (FreeBSD only)
RRunning or runnable (runnable = waiting in the run queue)
SInterruptible sleep: waiting for an event to complete
TSuspended: either by a job control signal or because it is being traced
ZZombie: terminated, parent not waiting
Solaris
ORunning: currently executing (O="onproc")
RRunnable: waiting in the run queue
SSleeping: waiting for an event to complete (usually I/O)
TSuspended: either by a job control signal or because it is being traced
ZZombie: terminated, parent not waiting

The second process, #8885, has a state code of T, which means it is suspended. In this case, the vim editor was running in the foreground, when it was suspended by pressing ^Z. (This procedure is described earlier in the chapter.)

The third process, #9067, has a state code of R. This means it is running. In fact, it is a find command (Chapter 25) that is running in the background.

Finally, the last process is the ps program itself. It also has a state code of R, because it too is running — in this case, in the foreground. It is this process that has displayed the output you are reading. In fact, by the time you see the output, the process has already terminated, and the shell process (#8175) has regained control.

Before we leave this example, I would like to point out something interesting. By looking at the PIDs and PPIDs, you can see that the shell is the parent of all the other processes. (This should make sense to you.)

Now let's discuss how to check states with BSD options. To start, take a look at Figure 26-6. The heading we want to display is STAT. Now look at Figure 26-5. Notice that all the variations of ps display the STAT column, including ps by itself with no options. If you are using a pure BSD system, all you need to use is:

ps

If you are using a mixed system (as is the case with Linux), you will have to force BSD output by using one of the BSD options. My suggestion is to choose j because it generates the least amount of output:

ps j

Here is some typical output, using the plain ps command on a FreeBSD system. (With the j option, the output would be similar but with more columns.)

  PID  TT  STAT     TIME   COMMAND
52496  p0  Ss    0:00.02   -sh (sh)
52563  p0  T     0:00.02   vi test
54123  p0  Z     0:00.00   (sh)
52717  p0  D     0:00.12   find / -name harley -print
52725  p0  R+    0:00.00   ps

The first process, #52496, is the shell.(*) Notice that the STAT column has more than one character. The first character is the state code. The second character gives esoteric technical information we can safely ignore. (If you are interested, see the man page.) In this case, the state code is S. Looking this up in Figure 26-7, we see that the process is waiting for something to finish. Specifically, it is waiting for a child process, #52725, the ps program.

* Footnote

As you will remember from Chapter 11, the name of the old Bourne shell was sh. You might be wondering, is this a Bourne shell? The answer is no; the Bourne shell has not been used for years. It happens that the FreeBSD shell is also named sh.

The second process, #52563, has a state code of T, meaning it is suspended. In this case, vi was suspended by pressing ^Z.

The third process, #54123, is an old shell with a state code of Z, which means it is a zombie. This is an unusual finding. Somehow, the process managed to die while its parent was not waiting for it. (See the discussion on zombies earlier in the chapter.)

The fourth process, #52717, is a find program running in the background. It has a state code of D, indicating that it is waiting for an I/O event to complete (in this case, reading from the disk). This makes sense, as find does a lot of I/O. You must remember, however that whenever you use ps, you are looking at an instantaneous snapshot. As it happened, we caught find when it was waiting for I/O. We might just as easily have found it running, in which case the state code would have been R.

Finally, the last process, #52725, is the ps program itself. It has a state code of R, because it is running in the foreground.

Before we leave this example, let me draw your attention to an interesting point. If you look at the rightmost column, COMMAND, you will see that it displays the entire command being executed. This column is only available with the BSD options. With the UNIX options, all you will ever see is the CMD column, which only shows the name, not the full command.(*)

* Footnote

With Solaris, the CMD column does show the full command.

— hint —

If you are using a system like Linux that supports both the UNIX and BSD options, you can pick the options that best serve your needs. For example, let's say you want to display a list of processes along with the full command (COMMAND), not the command name (CMD). If you have access to BSD options, you can use:

ps j

There is no easy way to do this using only UNIX options. (Yay for BSD!)

— hint for paranoids —

On a multiuser system, you can amuse yourself by using ps to snoop on what other people are doing. In particular, when you use BSD options, you can look in the COMMAND column and see the full commands that other users have entered. (If your system doesn't support BSD options, you can do the same thing with the w program; see Chapter 8.)

At first this will seem like harmless fun, until you realize that everyone else on the system can see what you are doing as well.

So be careful. If you are a guy, what do you think the system administrator or your girlfriend(*) would think if they were to snoop on you and see that, for the last hour, you have been working with the command vi pornography-list?

* Footnote

You need to be even more careful if your girlfriend is the system administrator.

Jump to top of page

Monitoring System Processes: top, prstat

To look at your own processes, you can use the ps command. However, what if you want to examine the system as a whole? To be sure, ps has options that will display a large variety of information about all the processes on the system. However, ps has a major limitation: it shows you a static snapshot of the processes, how they looked at a particular instant in time. Because processes are dynamic, this limitation becomes important when you need to watch how the various processes are changing from moment to moment. In such cases, you can use the top program to display overall system statistics updated every few seconds, as well as information about the most important processes as they change in real time.

The name of the program comes from the fact that it shows you the "top" processes, that is, the ones that are using the most CPU time. The syntax for using top is a bit complicated and can vary slightly from one system to another. Here is the basic syntax that you would use with Linux. With other systems, the options will vary, so you will have to check your online manual.

top [-d delay] [-n count] [-p pid[,pid]...]

where delay is the refresh interval in seconds; count is the total number of times to refresh; and pid is a process ID.

The top program is available with most Linux and BSD systems. If your system does not have top, there will usually be an equivalent program. For example, with Solaris, you can use prstat instead. Because the options can vary depending on your version of top, it is worth a moment of your time to check with the man page on your system.

To watch how top works, enter the command by itself:

top

To quit the program at any time, press q or ^C.

Like less and vi, top works in raw mode (see Chapter 21). This allows it to take over the command line and screen completely, displaying lines and changing characters as necessary. As an example, take a look at Figure 26-8, where you see an abbreviated example of some typical output.

Figure 26-8: The top program

The top program is used to display dynamic information about the "top" processes on the system, that is, the processes that are using the most CPU time. What you see here is an abbreviated example of the type of output top displays. The output is updated at regular intervals. As top executes, you can type commands to control its behavior. For help, press h; to quit, press q or ^C.

top - 9:10:24 up 14:50, 7 users, load average: 0.32,0.17,0.05
Tasks: 97 total, 1 running, 92 sleeping, 4 stopped, 0 zombie
Cpu(s): 1.7% us, 2.0% sy, 0.0% ni, 96.4% id, 0.0% wa
Mem: 385632k total, 287164k used, 98468k free, 41268k buffer
Swap: 786424k total, 0k used, 786424k free, 156016k cached

 PID USER   PR NI VIRT  RES  SHR S %CPU %MEM  TIME+  COMMAND
4016 harley 16  0 2124  992  780 R  1.3  0.3 0:00.35 top
3274 harley 15  0 7980 1808 1324 S  0.3  0.5 0:01.14 sshd
   1 root   16  0 1996  680  588 S  0.0  0.2 0:01.67 init
   2 root   34 19    0    0    0 S  0.0  0.0 0:00.00 ksoftirqd
   3 root   RT  0    0    0    0 S  0.0  0.0 0:00.00 watchdg
   4 root   10 -5    0    0    0 S  0.0  0.0 0:00.00 events
   5 root   10 -5    0    0    0 S  0.0  0.0 0:00.01 khelper
   6 root   11 -5    0    0    0 S  0.0  0.0 0:00.00 kthread
   8 root   10 -5    0    0    0 S  0.0  0.0 0:00.02 kblockd
  11 root   10 -5    0    0    0 S  0.0  0.0 0:00.00 khubd

The output can be divided into two parts. The top five lines show information about the system as a whole. In our example, the top line shows the time (9:10 AM), how long the system has been running (14 hours and 50 minutes), and the number of users (7). There is also a wealth of other, more technical information showing statistics about processes, CPU time, real memory (memory), and virtual memory (swap space).

Below the system information, you see data describing the various processes, one process per line, listed in order of CPU usage. In our example, the system is quiet. In fact, top itself is the top process.

The top program is powerful because it automatically refreshes the statistics at regular intervals. The default interval varies depending on your version of top. For example, on one of my Linux systems, the default is 3 seconds; on my FreeBSD system, it is 2 seconds ;on my Solaris system (using prstat), it is 5 seconds. To change the refresh rate, use the -d (delay) option. For example, to tell top to refresh itself every second, you would use:

top -d 1

Some versions of top allow you to enter even shorter intervals. If your system supports it, try running with a very fast refresh rate, such as:

top -d 0.1

On a busy system, this makes for a fascinating display.(*)

* Footnote

Tip for guys: If you have a hot date you are eager to impress, invite her back to your place and have her sit in front of your computer. Then log into a busy Unix or Linux system and run top with a refresh rate of 1 second or less. If that doesn't impress her, nothing will.

Because top works in raw mode, you can type various commands as the program runs. The most important command is q, which quits the program. The next most important command is h (help) or ?, which displays a summary of all the commands. A third command — not always documented — is the <Space> key. This forces top to refresh the display at that moment. Pressing <Space> is useful when you have chosen a slow refresh rate, and you need an instant update. I won't go over all the commands here, as they are very technical. However, when you have a moment, press h and see what is available with your version of top.

For extra control, there are two other options you can use. By default, top refreshes indefinitely. The -n option let's you tell top to refresh only a certain number of times. For example, to refresh the display 6 times, once every 10 seconds, you would use:

top -d 10 -n 6

In this case, the program will run for only 60 seconds.

To display information about a specific process, use -p followed by the process ID, for example:

top -p 3274

To specify more than one process ID, separate them with commas. For example, the following command uses a refresh rate of 1 second and displays information about processes #1 through #5:

top -d 1 -p 1,2,3,4,5

In general, top is used more by system administrators and programmers than by regular users. Typically, an admin will use top for performance monitoring. For example, he may want to see how a new application is doing on a server, or he may want to evaluate two different database programs to see which one works more efficiently. Programmers will often use top to test how a program performs under various workloads.

You will find that ps suits your needs more often than top. However, in certain situations, top can be invaluable. For example, if you are using a system that, all of a sudden, becomes abnormally slow, you can use top to find out what is happening.

Jump to top of page

Displaying a Process Tree: pstree, ptree

So far, we have discussed two important tools you can use to display information about processes: ps to look at static information, and top to look at dynamic information. A third tool, pstree, is useful when you want to understand the relationships between processes.

Earlier in the chapter, I explained that every process (except the very first one) is created by another process. When this happens, the original process is called the parent; the newly created process is called the child. Whenever a new process is created, it is given an identification number called a process ID or PID.

Towards the end of the startup procedure, the kernel creates the very first process, the idle process, which is given a PID of #0. After performing a number of tasks, the idle process creates the second process, the init process, which is given a PID of #1. The idle process then goes into a permanent sleep (hence the name).

The job of the init process is to create a variety of other processes. Most of these third-generation processes are daemons (I'll explain the name later in the chapter), whose job is to wait for something to happen and then react appropriately. In particular, there are daemons that do nothing but wait for users to log in. When a user is ready to log in, the daemon creates another process to handle the task. The login daemon then creates another process to run the user's shell. Finally, whenever the shell needs to execute a program for the user, the shell creates yet another process to carry out the job.

Although this arrangement seems complicated, it can be simplified enormously by making one simple observation: every process (except the first one) has a single parent. Thus, it is possible to imagine all the processes in the system arranged into a large, tree-structured hierarchy with the init process at the root of the tree. We call such data structures PROCESS TREES, and we use them to show the connections between parent processes and their children.

You can display a diagram of any part of the system process tree by using the pstree program. For example, you can display the entire process tree, starting from the init process. Or, you can display a subtree based on a specific PID or userid. The syntax to use is:

pstree [-aAcGnpu] [ pid | userid ]

where pid is a process ID, and userid is a userid.

The pstree program is available with most Unix systems. If your system does not have pstree, there will sometimes be an equivalent program. For example, with Solaris, you can use ptree instead. (See the online manual for the details.) On other systems, the ps command has special options to display process trees. You can try ps f or ps -H, although the output won't be as good as with pstree.

To see how pstree works, start by entering the command without any options. By default, pstree draws the process tree for the entire system, starting with the init process. This generates a lot of lines, so it is a good idea to pipe the output to less (Chapter 21) to display one screenful at a time:

pstree | less

Here is an abbreviated example showing the first eight lines of output on a Linux system. Notice that the root of the tree — the init process — is at the top of the diagram:

init-+-apmd
     |-and
     |-automount
     |-crond
     |-cups-config-dae
     |-cupsd
     |-2*[dbus-daemon---{dbus-daemon}]
     |-dbus-launch

As you look at this process tree, I want you to notice several things. First, at each level, the tree is sorted alphabetically by process name. This is the default, which you can change using the -n option (see below).

Next, notice the notation 2* in the second to last line. This means that the there are two identical subtrees. Using such notation allows pstree to create a more compact diagram. If you want pstree to expand all subtrees, even the identical ones, use the -c (do not compact) option.

Finally, you can see that pstree uses plain ASCII characters to draw the branches of the tree. With some terminals, pstree will use special line drawing characters instead. This enables it to draw continuous lines. If for some reason your output doesn't look right, you can use -A to force the use of ASCII characters or -G to force the use of line drawing characters. Take a moment to experiment and see which type of output looks best on your system:

pstree -A | less
pstree -G | less

Aside from the display options, there are options that enable you to control which information is displayed. My two favorite options are -p, which displays the PID of each process, and -n, which sorts the tree by PID, instead of by process name:

pstree -np

Here are the first eight lines of the output using these options. We see that the process tree starts with process #1 (the init process). This process has a number of children: process #2, #3, #4, #5, #6, and so on. Process #6 has children of its own: #8, #11, #13, #80, and so on.

init(1)-+-ksoftirqd(2)
        |-watchdog(3)
        |-events(4)
        |-khelper(5)
        |-kthread(6)-+-kblockd(8)
        |            |-khubd(11)
        |            |-kseriod(13)
        |            |-pdflush(80)

By default, pstree draws the entire process tree starting from the root, that is, starting from process #1. There will be times, however, when you will be most interested in parts of the tree. In such cases, there are two ways you can limit the output. If you specify a PID, pstree will display the subtree descended from that particular process.

Here is an example. You use Bash for your shell. From the shell, you have two processes in the background: make and gcc. You also have two processes that are suspended: vim and man. You want to display a process tree showing only these processes. To start, you use ps or echo $$ to find out the PID of your shell. It happens to be #2146. You then enter the following command:

pstree -p 2146

Here is the output:

bash-+-gcc(4252)
     |-pstree(4281)
     |-make(4276)
     |-man(4285)---sh(4295)---less(4301)
     `-vim(4249)

Notice that man has created a child process to run a new shell (#4295), which has created another child process (#4301) to run less. This is because man calls upon less to display its output.

The second way in which you can restrict the range of the process tree is to specify a userid instead of a PID. When you do this, pstree displays only those processes that are running under the auspices of that userid, for example:

pstree -p harley

The last two options I want to mention are used to show extra information along with the process name. The -a (all) option displays the entire command line for each process, not just the name of the program. The -u (userid change) option marks the transition whenever a child process runs under a different userid than its parent.

Jump to top of page

Thinking About How Unix Organizes
Processes and Files: fuser

Before we move on, I want to take a moment and ask you to think about the similarities between how Unix organizes processes and files.

Both processes and files can be thought of as existing within hierarchical trees with the root at the top. The root of the process tree is process #1 (the init process). The root of the file tree is the root directory (see Chapter 23). Within the process tree, every process has a single parent process above it. Within the file tree, every subdirectory has a single parent directory above it. To display the file tree, we use the tree program (Chapter 24). To display the process tree, we use the pstree program.

With a little more thought, we can find even more similarities. Every process is identified by a unique number called a process ID. Every file is identified by a unique number called an inumber. Internally, Unix keeps track of processes by using a process table, indexed by process ID. Within the process table, each entry contains information about a single process. Similarly, Unix keeps track of files by using an inode table, indexed by inumber. Within the inode table, each entry (the inode) contains information about a single file.

However, for a very good reason, this is about as far as we can push the analogy. Why? Because there is a fundamental difference between processes and files. Processes are dynamic: at every instant, the data that describes them is changing. Files are comparatively static.

For example, to display information about files, we use the ls program (Chapters 24 and 25), which simply looks in the inode table for its data. To display information about processes, we use the ps and top programs, and gathering information about processes is trickier. To be sure, some basic data can be found in the process table. However, most of the dynamic information must come from the kernel itself, and obtaining such information is not as simple as looking in a table.

In order to procure the data they need to do their jobs, both ps and top must use a type of pseudo file called a proc file (see Chapter 23). Within the /proc directory, every process is represented by its own proc file. When a program needs information about a process, it reads from that process' proc file. This, in turn, triggers a request to the kernel to supply the necessary data. The whole thing happens so quickly that it doesn't occur to you that finding process information is more complicated than finding file information.

You might be wondering, are there any tools that bridge the gap between processes and files? Yes, there are. One of the most interesting is fuser, a system administration tool that lists all the processes that are using a specific file. For example, let's say you enter the following command to run the find program (Chapter 25) to search for files named foo. Notice that the program is run in the background, and that it redirects the standard output to a file named bar(*):

find / -name foo -print > bar 2>/dev/null &

* Footnote

See Chapter 9 for a discussion of the names foo and bar.

When the program starts, the shell displays the following message, showing you the job ID (3) and the process id (3739):

[3] 3739

Since standard output is redirected to bar, you know that this particular file will be in use while the program is running. To check this, you enter the command:

fuser bar

Here is the output:

bar: 3739

As you can see, the file bar is in use by process #3739. In this way, fuser provides an interesting example of how a single tool can gather information about both processes and files at the same time.

If you try experimenting with fuser, you may run into a problem that is worth discussing. The fuser program is meant to be used by system administrators. For this reason, it is commonly stored with other such tools in one of the admin directories, such as /sbin (see Chapter 23). However, unless you are logged in as superuser, it is unlikely that the admin directories will be in your search path. This means that, when you type in the fuser command, the shell will not be able to find the program.

When you encounter such a problem, simply use whereis (Chapter 25) to find the location of fuser on your system. For example:

whereis fuser

Here is some typical output:

fuser: /sbin/fuser /usr/share/man/man1/fuser.1.gz

In this case, the first path is the location of the program; the second path is the location of the man page. To run fuser, all you need to do is show the shell where to find the program:

/sbin/fuser bar

This is the technique to use when you want to run a program whose directory is not in your search path.

Jump to top of page

Killing a Process: kill

The kill program has two uses: to terminate a process and to send a signal to a process. In this section, we'll talk about termination. In the next section, we'll discuss the more general topic of signals.

As a rule, programs run until they finish on their own or until you tell them to quit. You can usually stop a program prematurely by pressing ^C to send the intr signal (see Chapter 7), or by typing a quit command. However, these methods won't always work. For instance, on occasion, a program will freeze and stop responding to the keyboard. In such cases, pressing ^C or typing a quit command will not work. A similar problem arises when you want to terminate a program that is running in the background. Because background processes don't read from the keyboard, there is no way to reach the program directly.

In such cases, you can terminate a program by using the kill program. When you terminate a program in this way, we say that you KILL it. The syntax to use is:

kill [-9] pid... | jobid...

where pid or jobid identifies the process.

Most of the time, you will want to kill a single process. First you will use ps or jobs to find the process ID or job ID of the process you want to kill. Then you will use kill to carry out the actual termination. Consider the following example. You have entered the command below to run the make program in the background:

make game > makeoutput 2> makeerrors &

Some time later, you decide to kill the process. The first step is to find out the process ID. You enter:

ps

The output is:

 PID TTY         TIME CMD
2146 tty2    00:00:00 bash
5505 tty2    00:00:00 make
5534 tty2    00:00:00 ps

The process ID you want is 5505. To kill this process, you enter:

kill 5505

The shell will kill the process and display a message, for example:

[2]  Terminated   make game >makeoutput 2>makeerrors

This means the process that was running the program make game has been killed. The number at the beginning of the line means that the process was job #2.

An alternative way to list your processes is to use the jobs -l command. Let's say you had used the following command instead of ps:

jobs -l

Here is what you would have seen:

[2]-  5505 Running    make game >makeoutput 2>makeerrors &

Again, you could use the command kill 5505 to kill the make process. However, there is an alternative. You can specify a job number in the same manner as when you use the fg and bg commands (see Figure 26-3). Thus, in this case, any of the following commands would work:

kill 5505
kill %-
kill %2
kill %make
kill %?game

Here is another common situation. A foreground process becomes so unresponsive, you can't stop it no matter what you type, including ^C. You have two choices. First, you can try pressing ^Z to suspend the process. If this is successful, you can then use ps or jobs to find the process and terminate it with kill.

Alternatively, you can open up a new terminal window and use ps -u or ps U to list all the processes running under your userid. You can then identify the runaway process and terminate it with kill. In fact, this is sometimes the only way to kill a process that is off by itself in deep space.

If you are using a remote Unix host, you have a third choice. If all else fails, simply disconnect from the host. On some systems, when your connection drops, the kernel automatically kills all your processes. Of course, this will also kill any other programs that may be running.

Whenever you kill a process with children, it has the side effect of killing the children. Thus, you can kill an entire group of related processes simply by finding and killing the original parent. (In Unix, family ties run deep.)

When ^C or a quit command doesn't work, kill will usually do the job. However, on occasion, even kill will fail. In such cases, there is a variation that always works: specify the option -9 as part of the command. This sends the "sure kill" signal 9 (which we will discuss in the next section). For example:

kill -9 5505
kill -9 %2

Sending signal 9 will always work. However, it should be your last choice, because it kills too quickly. When you use kill -9, it does not allow the process the opportunity to release any resources it may be using. For example, the process will not be able to close files (which may result in data loss), release memory, and so on. Using kill -9 may also result in abandoned child processes which, later, will not be able to die properly. (See the discussion on orphans earlier in the chapter.)

Although the kernel will usually clean up the mess, it is smart to try all the other techniques before resorting to drastic measures.

Jump to top of page

Sending a Signal to a Process: kill

As we have just discussed, you can use the kill program to terminate a process that is otherwise unreachable. However, kill is not merely a termination tool. It is actually a powerful program that can send any signal to any process. When used in this way, the more general form of the syntax is:

kill [-signal] pid...|jobid...

where signal is the type of signal you want to send, and pid or jobid identifies a process, as discussed in the previous section.

In Chapter 23, we encountered the concept of interprocess communication or IPC, the exchanging of data between two processes. At the time, we were discussing the use of named pipes as a means of sending data from one process to another. The purpose of the kill program is to support a different type of IPC, specifically, the sending of a very simple message called a SIGNAL. A signal is nothing more than a number that is sent to a process to let it know that some type of event has occurred. It is up to the process to recognize the signal and do something. When a process does this, we say that it TRAPS the signal.

In Chapter 7, we encountered signals during our discussion of several of the special key combinations, such as ^C and ^Z. When you press one of these keys, it sends a signal to the current foreground process. For example, when you press ^C, it sends signal 2.

There are a large variety of signals used within Unix, most of which are of interest only to system programmers. For reference, Figure 26-9 contains a list of the most commonly used signals. Notice that each signal is known by a number, as well as a standardized name and abbreviation (both of which should be typed in uppercase letters).

Figure 26-9: Signals

Signals are used as a simple, but important form of interprocess control. This list shows the most commonly used signals, along with their names. When you use kill to send a signal to a process, you can specify the signal using its number, its name, or its abbreviation. If you use a name or abbreviation, be sure to type uppercase letters. Some of the numbers vary from one type of system to another so, as a general rule, it is best to use names or abbreviations, which are standardized. The signal numbers shown here are the ones used with Linux.

Number Name Abbrev. Description
1SIGHUPHUPHang-up: sent to processes when you log out or if your terminal disconnects
2SIGINTINTInterrupt: sent when you press ^C
9SIGKILLKILLKill: immediate termination; cannot be trapped by a process
15SIGTERMTERMTerminate: request to terminate; can be trapped by a process
18SIGCONTCONTContinue: resume suspended process; sent by fg or bg
19SIGSTOPSTOPStop (suspend): sent when you press ^Z

In general, the signal numbers for HUP, INT, KILL and TERM are the same on all systems. However, the other signal numbers can vary from one type of Unix to another. For this reason, it is a good habit to use the names or abbreviations — which are always the same — rather than numbers. The chart in Figure 26-9 shows the signal numbers that are used with Linux.

If you would like to see the full list of signals supported by your system, enter the kill command with the -l (list) option:

kill -l

If this option does not work on your system, you can look for an include file (see Chapter 23) named signal.h and display its contents. Use one of the following commands:

locate signal.h
find / -name 'signal.h' -print 2> /dev/null

The kill program lets you specify any signal you want. For example, let's say you want to suspend job %2, which is running in the background. Simply send it the STOP signal:

kill -STOP %2

If you do not specify a signal, kill will, by default, send the TERM signal. Thus, the following commands (all acting upon process 3662) are equivalent:

kill 3662
kill -15 3662
kill -TERM 3662
kill -SIGTERM 3662

As I mentioned, there are many different signals, and the purpose of the kill command is to send one specific signal to a particular process. In this sense, it might have been better to name the command signal. However, by default, kill sends the TERM signal, which has the effect of killing the process, and this is why the command is called kill. Indeed, most people use kill only for killing processes and not for sending other signals.

For security reasons, a regular userid can send signals only to its own processes. The superuser, however, is allowed to send signals to any process on the system. This means that, if you are using your own system and you become stuck with a process that just won't die, you can always change to superuser and use kill to put the process out of its misery. Be very careful, though, superuser + kill is a highly lethal combination that can get you into a lot of trouble if you don't know exactly what you are doing.

Jump to top of page

Setting the Priority for a Process: nice

At the beginning of the chapter, I explained that even a small Unix system can have over a hundred processes running at the same time. A large system can have thousands of processes, all of which need to share the system's resources: processors, memory, I/O devices, network connections, and so on. In order to manage such a complex workload, the kernel uses a sophisticated subsystem called the scheduler, whose job is to allot resources dynamically among the various processes.

In making such moment-to-moment decisions, the scheduler considers a variety of different values associated with each process. One of the more important values is the PRIORITY, an indication of how much precedence a process should be given over other processes. The priority is set by a number of factors, which are generally beyond the reach of individual users. This only makes sense for two reasons.

First, managing processes efficiently is a very complex operation, and the scheduler can do it much better and much faster than a human being, even an experienced system administrator. Second, if it were possible to manipulate priorities, it would be far too tempting for users to raise the priorities of their own programs at the expense of other users and the system itself.

In certain situations, however, you may want to do the opposite. That is, you may want to lower the priority of one of your programs. Typically, this will happen when you are running a non-interactive program that requires a relatively large amount of CPU time over an extended period. In such cases, you might as well be a nice person and run the program in the background at a low priority. After all, you won't notice if the program takes a bit longer to finish, and running it with a low priority allows the scheduler to give precedence to other programs, making the system more responsive and efficient.

To run programs at a lower priority you use a tool called nice. (Can you see where the name comes from?) The syntax is:

nice [-n adjustment] command

where adjustment is a numeric value, and command is the command you want to run.

The simplest way to use nice is simply to type the name in front of a command you plan on running in the background, for example:

nice gcc myprogram.c &

That's all there is to it. When you start a program in this way, nice will cause it to run at a reduced priority. However, nice will not automatically run the program in the background. You will need to do that yourself by typing an & character at the end of the command.

Which types of programs should you use with nice? In general, any program that can run in the background and which uses a large amount of CPU time. Traditional uses for nice are for programs that compile a large amount of source code, that make (put together) software packages, or that perform complex mathematical computations. For example, if you share a multiuser system and you are testing a program that calculates the 1,000,000th digit of pi, you definitely want to run it with as low a priority as possible.

When it comes to using nice, there are two caveats of which you should be aware. First, you can use nice only with self-contained programs that run on their own. For example, you can use nice with external commands and shell scripts. However, you cannot lower the priority of builtin shell commands (internal commands), pipelines, or compound commands.

The second consideration is that you should use nice only with programs that run in the background. Although it is possible to lower the priority of a foreground program, it doesn't make sense to do so. After all, when a program runs in the foreground, you want it to be able to respond to you as quickly as possible.

In most cases, it will suffice to use nice in the way I have described. On occasion, however, you may want to have more control over how much the priority is lowered. To do so, you can use the -n option followed by a value called the NICE NUMBER or NICENESS. On most systems, you can specify a nice number of 0 to 19. The higher the nice number, the lower the priority of the program (which means the nicer you are as a user).

When you run a program in the regular way (without nice), the program is given a niceness of 0. This is considered to be normal priority. When you use nice without the -n option, it defaults to a niceness of 10, right in the middle of the range. Most of the time, that is all you need. However, you can specify your own value if necessary.

For example, let's say you have a program called calculate that spends hours and hours performing complex mathematical calculations. To be a nice guy, you decide to run the program in the background with as low a priority as possible. You can use a command like the following:

nice -n 19 calculate > outputfile 2> errorfile &

By now, you are probably wondering, if a high nice number will lower the priority of a program, will a low nice number raise the priority? The answer is yes, but only if you are superuser. As superuser you are allowed to specify a negative number between -20 and -1. For example, to run a very special program with as high a priority as possible, change to superuser and enter the command:

nice -n -20 specialprogram

As you might imagine, setting a negative nice number is something you would rarely need to do. Indeed, most of the time, you are better off letting the scheduler manage the system on its own.

— hint —

When you share a multiuser system, it is a good habit to use nice to run CPU intensive programs in the background at a low priority. This keeps such programs from slowing down the system for other users.

However, nice can also come in handy on a single-user system. When you force your most demanding background programs to run at a low priority, you prevent them from slowing down your own moment-to-moment work.

(In other words, being nice always pays off.)

Jump to top of page

Changing the Priority of
an Existing Process: renice

On occasion, you may find yourself waiting a long time for a program that is running in the foreground, when it occurs to you that it would make more sense for the program to be running in the background at a low priority. In such cases, all you have to do is press ^Z to suspend the process, use bg to move it to the background, and then lower its priority. To lower the priority of an existing process, you use renice. The syntax is:

renice niceness -p processid

where niceness is a nice number, and processid is a process ID.

As we discussed in the previous section, a higher nice number means a lower priority. When you use renice as a regular user, you are only allowed to increase the nice number for a process, not lower it. That is, you can lower the priority of a process, but not raise it. In addition, as a reasonable precaution, regular users can only change the niceness for their own processes. (Can you see why?) These limitations, however, do not apply to the superuser.

Here is an example of how you might use renice. You enter the following command to run a program that calculates the 1,000,000th digit of pi:

picalculate > outputfile 2&> errorfile

After watching your program do nothing for a while, it occurs to you that you might as well run it in the background. At the same time, it would be a good idea to lower the priority as much as possible. First, press ^Z to suspend the foreground process. You see a message like the following:

[1]+  Stopped    picalculate >outputfile 2>errorfile

This tells you that the process, job #1, is suspended. You can now use bg to move the process to the background:

bg %1

You then see the following message, which tells you that the program is running in the background:

[1]+ picalculate >outputfile 2>errorfile &

Next, use ps to find out the process ID:

ps

The output is:

 PID TTY       TIME CMD
4052 tty1  00:00:00 bash
4089 tty1  00:00:00 picalculate
4105 tty1  00:00:00 ps

Finally, use renice to give the process the highest possible niceness, thereby lowering its priority as much as possible:

renice 19 -p 4089

You will see the following confirmation message:

4089: old priority 0, new priority 19

— hint —

If your system seems bogged down for no apparent reason, you can use top to see if there are any non-interactive processes taking a lot of CPU time. If so, you can consider using renice to lower the priority of the processes. (Warning: Don't muck around with processes you don't understand.)

Jump to top of page

Daemons

How many processes do you think are running on your system right now? It is easy to find out. Just use the ps program with the appropriate options to list every process, one per line, and then pipe the output to wc -l (Chapter 18) to count the lines.

Here are two commands that will do the job. The first command uses ps with UNIX options. The second uses the BSD options:

ps -e | wc -l
ps ax | wc -l

As a test, I ran these commands on three different Unix systems. First, I checked a Solaris system, which I accessed over the Internet. Absolutely no one else was using the system and nothing else was running on it. Next, I checked a Linux system sitting next to me. It was running a full GUI-based desktop environment, and no one else but me was using the system. Finally, I checked a FreeBSD system that acts as a medium-sized Web server and database server.

Here is what I found. Not counting the ps program itself, the small Solaris system was running 46 processes; the small Linux system was running 95 processes; and the medium FreeBSD system was running 133 processes. Remarkably, these are all relatively small numbers. It is not unusual for a good-sized Unix system to be running hundreds or even thousands of processes at the same time.

Obviously, most of these processes are not programs run by users. So what are they? The answer is they are DAEMONS, programs that run in the background, completely disconnected from any terminal, in order to provide a service. (With Microsoft Windows, the same type of functionality is provided by programs called "services".) Typically, a daemon will wait silently in the background for something to happen: an event, a request, an interrupt, a specific time interval, etc. When the trigger occurs, the daemon swings into action, doing whatever is necessary to carry out its job.

Daemons carry out a great many tasks that are necessary to run the system. For reference, I have listed some of the more interesting daemons in Figure 26-10. Although you will find a few daemons (such as init) on most Unix systems, there is a fair bit of variation. To find the daemons on your system, use ps and look in the output for a ? character in the TTY column. This indicates that the process is not controlled by a terminal.

Figure 26-10: Daemons

A daemon is a process that runs silently in the background, disconnected from any terminal, in order to provide a service. Unix systems typically have many daemons, each waiting to perform its job as needed. Here is a list of interesting daemons you will find on many systems. Notice that many of the names end with the letter "d".

Daemon Purpose
initAncestor of all other processes; adopts orphans
apacheApache Web server
atdRuns jobs queued by the at program
crondManages execution of prescheduled jobs (cron service)
cupsdPrint scheduler (CUPS=Common Unix Printing System)
dhcpdDynamically configure TCP/IP information for clients (DHCP)
ftpdFTP server (FTP=File Transfer Protocol)
gatedGateway routing for networks
httpdWeb server
inetdInternet services
kerneldLoads and unloads kernel modules as needed
kudzuDetects and configures new/changed hardware during boot
lpdPrint spooling (line printer daemon)
mysqlMySQL database server
namedInternet DNS name service (DNS=Domain Name System)
nfsdNetwork file access (NFS=Network File System)
ntpdTime synchronization (NTP=Network Time Protocol)
rpcbindRemote procedure calls (RPC)
routedManage network routing tables
schedAnother name for swapper
sendmailSMTP server (email)
smbdFile sharing & printing services for Windows clients (Samba)
sshdSSH (secure shell) connections
swapperCopies data from memory to swap space to reclaim physical memory
syncdSynchronizes file systems with contents of system memory
syslogdCollects various system messages (system logger)
xinetdInternet services (replacement for inetd)

The best command to use is the variation of ps that displays all the processes that are not controlled by a terminal:

ps -t - | less

If this command does not work on your system, try the following:

ps -e | grep '?' | less

Most daemons are created automatically during the last part of the boot sequence. In some cases, the processes are created by the init process (process #1). In other cases, the processes are created by parents that terminate themselves, turning the daemons into orphans. As you may remember from our discussion earlier in the chapter, all orphans are adopted by the init process. Thus, one way or the other, most daemons end up as children of process #1. For this reason, one definition of a daemon is any background process without a controlling terminal, whose parent's process ID is #1.

If you are using a Linux system, take a moment to look in the /etc/rc.d/init.d directory. Here you will find a large number of shell scripts, each of which is used to start, stop or restart a particular daemon.

What's in a Name?

Daemon


A daemon is a process that runs silently in the background, disconnected from any terminal, in order to provide a service. Although the name is pronounced "dee-mon", it is correctly spelled "daemon".

You may occasionally read that the name stands for "Disk and Executing Monitor", a term from the old DEC 10 and 20 computers. However, this explanation was made up after the fact. The name "daemon" was first used by MIT programmers who worked on CTSS (Compatible Time-sharing System), developed in 1963. They coined the term to refer to what were called "dragons" by other programmers who worked on ITS (Incompatible Time-sharing System).

CTSS and ITS were both ancestors to Unix. ITS was an important, though strange, operating system that developed a cult following at MIT. As CTSS and ITS programmers migrated from MIT to Bell Labs (the birthplace of Unix), the idea of daemons traveled with them.

So why the name "daemon"? One story is that the name comes from "Maxwell's demon", an imaginary creature devised by Scottish physicist James Maxwell (1831-1879) for a thought experiment related to the second law of thermodynamics. You can believe this or not. (I don't.)

Regardless of the origin, nobody knows why we use the British variation of the spelling. In Celtic mythology, a daemon is usually good or neutral, merely a spirit or inspiration. A demon, however, is always an evil entity. Perhaps there is a lesson here somewhere.

Jump to top of page

The End of the Last Chapter

I would like to thank you for spending so much time with me talking about Unix and Linux. I wrote this book in order to make Unix accessible to intelligent people and, to the extent that I have helped you, I am grateful for the opportunity.

Unix has traditionally attracted the most talented computer users and programmers, for whom working on Unix was a labor of love. One reason Unix is so wonderful, is that most of it was designed before the men in suits sat up and took notice. That is why Unix works so well and why it is so elegant; the basic Unix philosophy was developed long before the business and marketing people started trying to make money from it. As we discussed in Chapter 2, in the 1990s, this philosophy was transplanted to the Linux Project and to the open source community, with wonderful results.

You may remember my observing that Unix is not easy to learn, but it is easy to use. By now, you will realize what this means: that it is more important for a tool to be designed well for a smart person, than it is for the tool to be easy enough to be used on the first day by someone whose biggest intellectual challenge in life is downloading a ringtone.

You have my word that every moment you spend learning and using Unix will repay you generously. I can't be by your side as you work, but you do have this book, and I have put in a great deal of effort to provide you with the very best Unix companion I could.

Although I may be irreverent at times — indeed, whenever I am able to make a joke that my editors can't catch — I would like to take a moment to wish you the very best. As you read and re-read this book, please remember: I am on your side.

— hint —

Unix is fun.

Jump to top of page



Exercises

Review Question #1:

What is a process? What part of the operating system manages processes?

Define the following terms: process ID, parent process, child process, fork and exec.

Review Question #2:

What is a job? What part of the operating system manages jobs?

What is job control?

What is the difference between running a job in the foreground and running a job in the background?

How do you run a job in the foreground?

How do you run a job in the background?

How do you move a job from the foreground to the background?

Review Question #3:

The ps (process status) program is used to display information about processes. What are the two types of options you can use with this program?

For each type of option, which commands would you use to display information about:

• Your current processes
• Process #120357
• All the processes running on the system
• Processes associated with userid weedly

Review Question #4:

You are a system administrator. One of your systems seems to be bogging down and your job is to figure out why. To start, you want to take a look at various processes running on the system and how they are changing from moment to moment. Which program will you use? Specify the command that will run this program with an automatic update every 5 seconds.

Review Question #5:

What is the difference between killing a process and stopping a process?

How do you kill a process?

How do you stop a process?

Review Question #6:

You have started a program named foobar that is running amok. What steps would you take to kill it? If foobar does not respond, what do you do?

Applying Your Knowledge #1:

Enter a command line that pauses for 5 seconds and then displays the message "I am smart." Wait for 5 seconds to make sure it works.

Now change the delay to 30 seconds, and re- enter the command line. This time, before the 30 seconds are up, press ^C. What happens? Why?

Applying Your Knowledge #2:

You have just logged into a Unix system over the Internet using your userid, which is weedly. You enter the command ps -f command and see:

UID    PID  PPID C STIME TTY   TIME     CMD
weedly 2282 2281 0 15:31 pts/3 00:00:00 -bash
weedly 2547 2282 0 16:13 pts/3 00:00:00 ps -f

Everything looks fine. Just out of curiosity, you decide to check on the rest of the system, so you enter the command ps -af. Among the output lines, you see:

weedly 2522 2436 0 16:09 pts/4 00:00:00 vim secret

Someone else is logged in using your userid! What do you do?

Applying Your Knowledge #3:

Create a pipeline to count the number of daemons on your system. Then create a second pipeline to display a sorted list of all the daemons. You should display the names of the daemons only and nothing else.

For Further Thought #1:

Using the kill command to kill processes is more complicated than it needs to be. Describe a simpler way to provide the same functionality.

For Further Thought #2:

Why are there two different types of options for ps? Is this good or bad?

Jump to top of page