Harley Hahn's Guide to
|
A Personal Note
Chapters...
Command
INSTRUCTOR |
Chapter 15... Standard I/O, Redirection and Pipes
|
For that matter, why should every program you run have to know where its output was going? Sometimes you want output to be displayed on the screen; other times you want to save it in a file. There may even be times when you want to send output to another program for more processing. For these reasons, the Unix designers built a single tool whose job was to display data, one screenful at a time. This tool was called more, because after displaying a screenful of data, the program displayed the prompt --More-- to let the user know there was more data. The tool was simple to use. A user would read one screenful of data and then press <Space> to display the next screen. When the user was finished, he would type q to quit. Once more was available, programmers could stop worrying about how the output of their programs would be displayed. If you were a programmer, you knew that whenever a user running your program found himself with a lot of output, he would simply send it to more. (You'll learn how to do this later in the chapter.) If you were a user, you knew that, no matter how many programs you might ever use, you only needed to learn how to use one output display tool. This approach has three important advantages, even today. First, when you design a Unix tool, you can keep it simple. For example, you do not have to endow every new program with the ability to display data one screenful at a time: there is already a tool to do that. Similarly, there are also tools to sort output, remove certain columns, delete lines that do not contain specific patterns, and on and on (see Chapter 16). Since these tools are available to all Unix users, you don't have to include such functionality in your own programs. This leads us to the second advantage. Because each tool need only do one thing, you can, as a programmer, focus your effort. When you are designing, say, a program to search for specific patterns in a data file, you can make it the best possible pattern searching program; when you are designing a sorting program, you can make it the best possible sorting program; and so on. The third advantage is ease of use. As a user, once you learn the commands to control the standard screen display tool, you know how to control the output for any program. Thus, in two sentences, I can summarize the Unix philosophy as follows: • Each program or command should be a tool that does only one thing and does it well. • When you need a new tool, it is better to combine existing tools than to write new ones. We sometimes describe this philosophy as: • "Small is beautiful" or "Less is more".
Since Unix is well into its fourth decade, it makes sense to ask if the Unix philosophy has, in the long run, proven to be successful. The answer is, yes and no. To a large extent, the Unix philosophy is still intact. As you will see in Chapter 16, there are a great many single-purpose tools, which are easy to combine as the need arises. Moreover, because the original Unix developers designed the system so well, programs that are over 30 years old can, today, work seamlessly with programs that are brand new. Compare this to the world of Windows or the Macintosh. However, the original philosophy has proved inadequate in three important ways. First, too many people could not resist creating alternative versions of the basic tools. This means that you must sometimes learn how to use more than one tool to do the same job. For example, over the years, there have been three screen display programs in common use: more, pg and less. These days, most people use less, which is the most powerful (and most complex) of the three programs. However, more is simpler to use, and you will find it on all systems, so you really should know how to use it. My guess is that, one day, you will log in to a system that uses more to display output and, if you only know less, you will be confused. On the other hand, just understanding more is not good enough because, on many systems, less is the default (and less is a better program). The bottom line: you need to learn how to use at least two different screen display programs. The second way in which the Unix philosophy was inadequate had to do with the growing needs of users. The idea that small is beautiful has a lot of appeal, but as users grew more sophisticated and their needs grew more demanding, it became clear that simple tools were often not enough. For instance, the original Unix text editor was ed. (The name, which stands for "editor", is pronounced as two separate letters, "ee-dee"). ed was designed to be used with terminals that printed output on paper. The ed program had relatively few commands; it was simple to use and could be learned quickly. If you had used Unix in the early days, you would have found ed to be an unadorned, unpretentious tool: it did one thing (text editing) and it did it well(*). * Footnote The ed editor is still available on all Unix and Linux systems. Give it a try when you get a moment. Start by reading the man page (man ed). As editors go, ed was, at best, mildly sophisticated. Within a few years, however, terminals were improved and the needs of Unix users became more demanding. To respond to those needs, programmers developed new editors. In fact, over the years, literally tens of different text editors were developed. For mainstream users, ed was replaced by a program called ex. (The name, which stands for "extended editor" is pronounced as two separate letters, "ee-ex".) Then, ex itself was extended to create vi ("visual editor", pronounced "vee-eye"). As an alternative to the ed/ex/vi family, an entirely different editing system called Emacs was developed. Today, vi and Emacs are the most popular Unix text editors, but no one would ever accuse them of being simple and unadorned. Indeed, vi (Chapter 22) and Emacs are extremely complex. The third way in which the original Unix philosophy has proven inadequate has to do with a basic limitation of the CLI (command line interface). As you know, the CLI is text-based. This means it cannot handle graphics and pictures, or files that do not contain plain text, such as spreadsheets or word processor documents. Most command-line programs read and write text, which is why such programs are able to work together: they all use the same type of data. However, this means that when you want to use other types of data — non-textual data — you must use other types of programs. This is why, as we discussed in Chapters 5 and 6, you must learn how to use both the CLI and GUI environments. For these reasons, you must approach the learning of Unix carefully. In 1979, when Unix was only a decade old, the original design was still intact, and you could learn most everything about all the common commands. Today, there is so much more Unix to learn, you can't possibly know it all, or even most of it. This means you must be selective about which programs and tools you want to learn. Moreover, as you teach yourself how to use a tool, you must be selective about which options and commands you want to master. As you read the next two chapters, there is something important I want you to remember. By all means, you should work in front of your computer as you read, and enter new commands as you encounter them. This is how you learn to use Unix. However, I want you to do more than just memorize details. As you read and as you experiment, I want you to develop a sense of perspective. Every now and then, take a moment to pull back and ask yourself, "Where does the tool I am learning right now fit into the big picture?" My goal for you is that, in time, you will come to appreciate what we might call the new Unix philosophy: • "Small is beautiful, except when it isn't." hint Whenever you learn how to use a new program, do not feel as if you must memorize every detail. Rather, just answer three questions:
If there is one single idea that is central to using Unix effectively, it is the concept of standard input and output. Understand this one idea, and you are one giant step closer to becoming a Unix master. The basic idea is simple: Every text-based program should be able to accept input from any source and write output to any target. or instance, say you have a program that sorts lines of text. You should have your choice of typing the text at the keyboard, reading it from an existing file, or even using the output of another program. Similarly, the sort program should be able to display its output on the screen, write it to a file, or send it to another program for more processing. Using such a system has two wonderful advantages. First, as a user, you have enormous flexibility. When you run a program, you can define the input and output (I/O) as you see fit, which means you only have to learn one program for each task. For example, the same program that sorts a small amount of data and displays it on the screen, can also sort a large amount of data and save it to a file. The second advantage to doing I/O in this way is that it makes creating new tools a lot easier. When you write a program, you can depend on Unix to handle the input and output for you, which means you don't need to worry about all the variations. Instead, you can concentrate on the design and programming of your tool. The crucial idea here is that the source of input and the target of output are not specified by the programmer. Rather, he or she writes the program to read and write in a general way. Later, at the time the program runs, the shell connects the program to whatever input and output the user wants to use(*). * Footnote Historically, the idea of using abstract I/O devices was developed to allow programmers to write programs that were independent of specific hardware. Can you see how, philosophically, this idea is related to the layers of abstraction we discussed in Chapter 5, and to the terminal description databases (Termcap and Terminfo) we discussed in Chapter 7? To implement this idea, the developers of Unix designed a general way to read data called STANDARD INPUT and two general ways to write data called STANDARD OUTPUT and STANDARD ERROR. The reason there are two different output targets is that standard output is used for regular output, and standard error is used for error messages. Collectively, we refer to these facilities as STANDARD I/O (pronounced "standard eye-oh"). In practice, we often speak of these three terms informally as if they were actual objects. Thus, we might say, "To save the output of a program, write standard output to a file." What we really mean is, "To save the output of a program, tell the shell to set the output target to be a file." Understanding the concepts of standard input, standard output, and standard error are crucial to using Unix well. Moreover, these same concepts are also used to control I/O with other programming languages, such as C and C++. hint You will often see standard input, standard output, and standard error abbreviated as STDIN, STDOUT and STDERR. When we use these abbreviations in conversation, we pronounce them as "standard in", "standard out", and "standard error". For example, if you were creating some documentation, you might write, "The sort program reads from stdin, and writes to stdout and stderr." If you were reading this sentence to an audience, you would pronounce the abbreviations as follows: "The sort program reads from standard in, and writes to standard out and standard error."
When you log in, the shell automatically sets standard input to the keyboard, and standard output and standard error to the screen. This means that, by default, most programs will read from the keyboard and write to the screen. However — and here's where the power of Unix comes in — every time you enter a command, you can tell the shell to reset standard input, standard output or standard error for the duration of the command. In effect, you can tell the shell: "I want to run the sort command and save the output to a file called names. So for this command only, I want you to write standard output to that file. After the command is over, I want you to reset standard output back to my screen." Here is how it works: If you want the output of a command to go to the screen, you don't have to do anything. This is automatic. If you want the output of a command to go to a file, type > (greater-than) followed by the name of the file, at the end of the command. For example: sort > names This command will write its output to a file called names. The use of a > character is apt, because it looks like an arrow showing the path of the output. When you write output to a file in this way, the file may or may not already exist. If the file does not exist, the shell will create it for you automatically. In our example, the shell will create a file called names. If the file already exists, its contents will be replaced, so you must be careful. For instance, if the file names already exists, the original contents will be lost permanently. In some cases, this is fine, because you do want to replace the contents of the file, perhaps with newer information. In other cases, you may not want to lose what is in the file. Rather, you want to add new data to what is already there. To do so use >> , two greater-than characters in a row. This tells the shell to append any new data to the end of an existing file. Thus, consider the command: sort >> names If the file names does not exist, the shell will create it. If it does exist, the new data will be appended to the end of the file. Nothing will be lost. When we send standard output to a file, we say that we REDIRECT it. Thus, in the previous two examples, we redirect standard output to a file called names. Now you can see why there are two types of output: standard output and standard error. If you redirect the standard output to a file, you won't miss the error messages, as they will still be displayed on your monitor. When you redirect output, it is up to you to be careful, so you do not lose valuable data. There are two ways to do so. First, every time you redirect output to a file, think carefully: Do you want to replace the current contents of the file? If so, use >. Or, would you rather append new data to the end of the file. If that is the case, use >>. Second, as a safeguard, you can tell the shell to never replace the contents of an existing file. You do this by setting the noclobber option (Bash, Korn shell) or the noclobber shell variable (C-Shell, Tcsh). We'll discuss this in the next section.
In the previous section, I explained that when you redirect standard output to a file, any data that already exists in the file will be lost. I also explained that when you use >> to append output to a file, the file will be created if it does not already exist. There may be times when you do not want the shell to make such assumptions on your behalf. For example, say you have a file called names that contains 5,000 lines of data. You want to append the output of a sort command to the end of this file. In other words, you want to enter the command: sort >> names However, you make a mistake and accidentally enter: sort > names What happens? All of your original data is wiped out. Moreover, it is wiped out quickly. Even if you notice the error the moment you press <Return>, and even if you instantly press ^C to abort the program (by sending the intr signal; see Chapter 7), it is too late. The data in the file is gone forever. This is why. As soon as you press <Return>, the shell gets everything ready for the sort program by deleting the contents of your target file. Since the shell is a lot faster than you, by the time you abort the program the target file is already empty. To prevent such catastrophes, you can tell the shell not to replace an existing file when you use > to redirect output. In addition, with the C-Shell family, you can also tell the shell not to create a new file when you use >> to append data. This ensures that no files are replaced or created by accident. To have the shell take such precautions on your behalf, you use what we might call the noclobber facility. With the Bourne shell family (Bash, Korn shell), you set the noclobber shell option: set -o noclobber To unset this option, use: set +o noclobber With the C-Shell family (C-Shell, Tcsh), you set the noclobber shell variable: set noclobber To unset this variable, use: unset noclobber (See Chapter 12 for a discussion of options and variables; see Appendix G for a summary.) Once noclobber is set, you have built-in protection. For example, let's say you already have a file called names and you enter: sort > names You will see an error message telling you that the file names already exists. Here is such a message from Bash: bash: names: cannot overwrite existing file Here is the equivalent message from the Tcsh: names: File exists. In both cases, the shell has refused to carry out the command, and your file is safe. What if you really want to replace the file? In such cases, it is possible to override noclobber temporarily. With a Bourne shell, you use >| instead of >: sort >| names With a C-Shell, you use >! instead of >: sort >! names Using >| or >! instead of > tells the shell to redirect standard output even if the file exists. As we discussed earlier, you can append data to a file by redirecting standard output with >> instead of >. In both cases, if the output file does not exist, the shell will create it. However, if you are appending data, it would seem likely that you expect the file to already exist. Thus, if you use >> and the file does not exist, you are probably making a mistake. Can noclobber help you here? Not with a Bourne shell. If you append data with Bash or the Korn shell and the noclobber option is set, it's business as usual. The C-Shell and Tcsh know better. They will tell you that the file does not exist, and refuse to carry out the command. For example, say you are a C-Shell or Tcsh user; the noclobber shell variable is set; and you have a file named addresses, to which you want to append data. You enter the command: sort >> address You will see an error message: address: No such file or directory At which point you will probably say, "Oh, I should have typed addresses, not address. Thank you, Mr. C-Shell." Of course, there may be occasions when you are appending data to a file, and you really do want to override noclobber. For example, you are a C-Shell user and, for safety, you have set noclobber. You want to sort a file named input and append the data to a file named output. If output doesn't exist, you want to create it. The important thing is, if output does exist, you don't want to lose what is already in it, which is why you are appending (>>), not replacing (> ). If noclobber wasn't set, you would use: sort >> output Since noclobber is set, you must override it. To do so, just use >>! instead of >>: sort >>! output This will override the automatic check for this one command only.
By default, standard input is set to the keyboard. This means that, when you run a program that needs to read data, the program expects you to enter the data by typing it, one line at a time. When you are finished entering data, you press ^D (<Ctrl-D>) to send the eof signal (see Chapter 7). Pressing ^D indicates that there is no more data. Here is an example you can try for yourself. Enter: sort The sort program is now waiting for you to enter data from standard input (the keyboard). Type as many lines as you want. For example, you might enter:
Harley
After you have pressed <Return> on the last line, press ^D to send the eof signal. The sort program will now sort all the data alphabetically and write it to standard output. By default this is the screen, so you will see:
Casey
There will be many times, however, when you want to redirect standard input to read data from a file, rather than from the keyboard. Simply type < (less-than), followed by the name of the file, at the end of the command. For example, to sort the data contained in a file called names, use the command: sort < names As you can see, the < character is a good choice as it looks like an arrow showing the path of the input. Here is an example you can try for yourself. As I mentioned in Chapter 11, the basic information about each userid is contained in the file /etc/passwd. You can display a sorted version of this file by entering the command: sort < /etc/passwd As you might imagine, it is possible to redirect both standard input and standard output at the same time, and this is done frequently. Consider the following example: sort < rawdata > report This command reads data from a file named rawdata, sorts it, and writes the output to a file called report.
Although the following discussion is oriented towards the Bourne shell family, we will be talking about important ideas regarding Unix I/O. For that reason, I'd like you to read this entire section, regardless of which shell you happen to be using right now. As I explained earlier, the shell provides two different output targets: standard output and standard error. Standard output is used for regular output; standard error is used for error messages. By default, both types of output are displayed on the screen. However, you can separate the two output streams should the need arise. If you choose to separate the output streams, you have a lot of flexibility. For example, you can redirect standard output to a file, where it will be saved. At the same time, you can leave standard error alone, so you won't miss any error messages (which will be displayed on the screen). Alternatively, you can redirect standard output to one file and standard error to another file. Or you can redirect both types of output to the same file. Alternatively, you can send standard output or standard error (or both) to another program for further processing. I'll show you how to do that later in the chapter when we discuss pipelines. The syntax for redirecting standard error is different for the two shell families. We'll talk about the Bourne shell family first, and then move on to the C-Shell family. To prepare you, however, I need to take a moment to explain one aspect of how Unix handles I/O. Within a Unix process, every input source and every output target is identified by a unique number called a FILE DESCRIPTOR. For example, a process might read data from file #8 and write data to file #6. When you write programs, you use file descriptors to control the I/O, one for each file you want to use. Within the Bourne shell family, the official syntax for redirecting input or output is to use the number of a file descriptor followed by < (less-than) or > (greater-than). For example, let's say a program named calculate is designed to write output to a file with file descriptor 8. You could run the program and redirect its output to a file named results by using the command: calculate 8> results By default, Unix provides every process with three pre-defined file descriptors, and most of the time that is all you will need. The default file descriptors are 0 for standard input, 1 for standard output, and 2 for standard error. Thus, within the Bourne shell family, the syntax for redirecting standard input is to use 0< followed by the name of the input file. For example: command 0< inputfile where command is the name of a command, and inputfile is the name of a file. The syntax for redirecting standard output and standard error are similar. For standard output: command 1> outputfile For standard error: command 2> errorfile where command is the name of a command, and outputfile and errorfile are the names of files. As a convenience, if you leave out the 0 when you redirect input, the shell assumes you are referring to standard input. Thus, the following two commands are equivalent:
sort 0< rawdata
Similarly, if you leave out the 1 when you redirect output, the shell assumes you are referring to standard output. Thus, the following two commands are also equivalent:
sort 1> results
Of course, you can use more than one redirection in the same command. In the following examples, the sort command reads its input from a file named rawdata, writes its output to a file named results, and writes any error messages to a file named errors:
sort 0< rawdata 1> results 2> errors
Notice that you can leave out the file descriptor only for standard input and standard output. With standard error, you must include the 2. This is shown in the following simple example, in which standard error is redirected to a file named errors: sort 2> errors When you redirect standard error, it doesn't affect standard input or standard output. In this case, standard input still comes from the keyboard, and standard output still goes to the monitor. As with all redirection, when you write standard error to a file that already exists, the new data will replace the existing contents of the file. In our last example, the contents of the file errors would be lost. If you want to append new output to the end of a file, just use 2>> instead of 2>. For example: sort 2>> errors Redirecting standard error with the C-Shell family is a bit more complicated. Before we get to it, I need to take a moment to discuss an important facility called subshells. Even if you don't use the C-Shell or Tcsh, I want you to read the next section, as subshells are important for everyone.
To understand the concept of a subshell, you need to know a bit about Unix processes. In Chapter 26, we will discuss the topic in great detail. For now, here is a quick summary. A PROCESS is a program that is loaded into memory and ready to run, along with the program's data and the information needed to keep track of that program. When a process needs to start another process, it creates a duplicate process. The original is called the PARENT; the duplicate is called the CHILD. The child starts running and the parent waits for the child to die (that is, to finish). Once the child dies, the parent then wakes up, regains control and starts running again, at which time the child vanishes. To relate this to your minute-to-minute work, think about what happens when you enter a command. The shell parses the command and figures out whether it is an internal command (one that is built-in to the shell) or an external command (a separate program). When you enter a builtin command, the shell interprets it directly within its own process. There is no need to create a new process. When you enter an external command, the shell finds the appropriate program and runs it as a new process. When the program terminates, the shell regains control and waits for you to enter another command. In this case, the shell is the parent, and the program it runs on your behalf is the child. Consider what happens when you start a brand new shell for yourself. For instance, if you are using Bash and you enter the bash command (or if you are using the C-Shell, and you enter the csh command, and so on). The original shell (the parent) starts a new shell (the child). Whenever a shell starts another shell, we call the second shell a SUBSHELL. Thus, we can say that, whenever you start a new shell (by entering bash or ksh or csh or tcsh), you cause a subshell to be created. Whatever commands you now enter will be interpreted by the subshell. To end the subshell, you press ^D to send the eof signal (see chapter 7). At this point, the parent shell regains control. Now, whatever commands you enter are interpreted by the original shell. When a subshell is created, it inherits the environment of the parent (see Chapter 12). However, any changes the subshell makes to the environment are not passed back to the parent. Thus, if a subshell modifies or creates environment variables, the changes do not affect the original shell. This means that, within a subshell, you can do whatever you want without affecting the parent shell. This capability is so handy, that Unix gives you two ways to use subshells. First, as I mentioned above, you can enter a command to start a brand new shell explicitly. For example, if you are using Bash, you would enter bash. You can now do whatever you want without affecting the original shell. For instance, if you were to change an environment variable or a shell option, the change would disappear as soon as you entered ^D; that is, the moment that the new shell dies and the original shell regains control. There will be times when you want to run a small group of commands, or even a single command, in a subshell without having to deal with a whole new shell. Unix has a special facility for such cases: just enclose the commands in parentheses. That tells the shell to run the commands in a subshell. For example, to run the date command in subshell, you would use: (date) Of course, there is no reason to run date in a subshell. Here, however, is a more realistic example using directories. In Chapter 24, we will discuss directories, which are used to contain files. You can create as many directories as you want and, as you work, you can move from one directory to another. At any time, the directory in which you are currently working is called your working directory. Let's say you have two directories named documents and spreadsheets, and you are currently working in the documents directory. You want to change to the spreadsheets directory and run a program named calculate. Before you can run the program, you need to set the environment variable DATA to the name of a file that contains certain raw data. In this case, the file is named statistics. Once the program has run, you need to restore DATA to its previous value, and change back to the documents directory. (In other words, you need to reset the environment to its previous state.) One way to do this is start a new shell, then change your working directory, change the value of DATA, and run the calculate program. Once this is all done, you can exit the shell by pressing ^D. When the new shell ends and the old shell regains control, your working directory and the variable DATA will be in their original state. Here is what it looks like, assuming you use Bash for your shell. (The cd command, which we will meet in Chapter 24, changes your working directory. Don't worry about the syntax for now.)
bash
Here is an easier way, using parentheses: (cd ../spreadsheets; export DATA=statistics; calculate) When you use a subshell in this way, you don't have to worry about starting or stopping a new shell. It is done for you automatically. Moreover, within the subshell, you can do anything you want to the environment without having permanent effects. For example, you can change your working directory, create or modify environment variables, create or modify shell variables, change shell options, and so on. You will sometimes see the commands within the parentheses called a GROUPING, especially when you are reading documentation for the C-Shell family. In our example, for instance, we used a grouping of three commands. The most common reason to use a grouping and a subshell is to prevent the cd (change directory) command from affecting the current shell. The general format is: (cd directory; command)
Within the Bourne shell family, redirecting standard error is straightforward. You use 2> followed by the name of a file. With the C-Shell family (C-Shell, Tcsh), redirecting standard error is not as simple, because of an interesting limitation, which I'll get to in a moment. With the C-Shell family, the basic syntax for redirecting standard error is: command >& outputfile where command is the name of a command, and outputfile is the name of a file. For example, if you are using the C-Shell or Tcsh, the following command redirects standard error to a file named output: sort >& output If you want to append the output to the end of an existing file, use >>& instead of >&. In the following example, the output is appended to a file named output: sort >>& output If you have set the noclobber shell variable (explained earlier in the chapter) and you want to override it temporarily, use >&! instead of >&. For example: sort >&! output In this example, the contents of the file will be replaced, even if noclobber is set. So what is the limitation I mentioned? When you use >& or >&!, the shell redirects both standard output and standard error. In fact, within the C-Shell family, there is no simple way to redirect standard error all by itself. Thus, in the last example, both the standard output and standard error are redirected to a file named output. It happens that there is a way to redirect standard error separately from standard output. However, in order to do it, you need to know how to use subshells (explained in the previous section). The syntax is: (command > outputfile) >& errorfile where command is the name of a command, and outputfile and errorfile are the names of files. For example, say you want to use sort with standard output redirected to a file named output, and standard error redirected to a file named errors. You would use: (sort > output) >& errors In this case, sort runs in a subshell and, within that subshell, standard output is redirected. Outside the subshell, what is left of the output — standard error — is redirected to a different file. The net effect is to redirect each type of output to its own file. Of course, if you want, you can append the output by using >> and >>&. For example, to append standard output to a file named output, and append standard error to a file named errors, use a command like the following: (sort >> output) >>& errors
All shells allow you to redirect standard output and standard error. But what if you want to redirect both standard output and standard error to the same place? With the C-Shell family, this is easy, because when you use >& (replace) or >>& (append), the shell automatically combines both output streams. For example, in the following C-Shell commands, both standard output and standard error are redirected to a file named output:
sort >& output
With the Bourne shell family, the scenario is more complicated. We'll talk about the details, and then I'll show you a shortcut that you can use with Bash. The basic idea is to redirect one type of output to a file, and then redirect the other type of output to the same place. The syntax to do so is: command x> outputfile y>&x where command is the name of a command, x and y are file descriptors, and outputfile is the name of a file. For example, in the following sort command, standard output (file descriptor 1) is redirected to a file named output. Then standard error (file descriptor 2) is redirected to the same place as file descriptor 1. The overall effect is to send both regular output and error messages to the same file: sort 1> output 2>&1 Since, file descriptor 1 is the default for redirected output, you can leave out the first instance of the number 1: sort > output 2>&1 Before we move on, I'd like to talk about an interesting mistake that is easy to make. What happens if you reverse the order of the redirections? sort 2>&1 > output Although this looks almost the same as the example above, it won't work. Here is why: The instruction 2>&1 tells the shell to send the output of file descriptor 2 (standard error) to the same place as the output of file descriptor 1 (standard output). However, in this case, the instruction is given to the shell before standard output is redirected. Thus, when the shell processes 2>&1, standard output is still being sent to the monitor (by default). This means that standard error ends up being redirected to the monitor, which is where it was going in the first place. The net result is that standard error goes to the monitor, while standard output goes to a file. (Take a moment to think about this, until it makes sense.) To continue, what if you want to redirect both standard output and standard error, but you want to append the output to a file? Just use >> instead of >: sort >> output 2>&1 In this case, using >> causes both standard output and standard error to be appended to the file named output. You might ask, is it possible to combine both types of output by starting with standard error? That is, can you redirect standard error to a file and then send standard output to the same place? The answer is yes:
sort 2> output 1>&2
The commands are different from the earlier examples, but they have the same effect. As you can see, the Bourne shell family makes combining two output streams complicated. Can it not be made simpler? Why not just send both standard output and standard error to the same file directly? For example: sort > output 2> output Although this looks as if it might work, it won't, because, if you redirect to the same file twice in one command, one of the redirections will obliterate the other one. And now the shortcut. You can use the above technique with all members of the Bourne shell family, in particular, with Bash and the Korn shell. With Bash, however, you can also use either &> or >& (choose the one you like best) to redirect both standard output and standard error at the same time:
sort &> output
This allows you to avoid having to remember the more complicated pattern. However, if you want to redirect both standard output and standard error and append the output, you will need to use the pattern we discussed above: sort >> output 2>&1 By now, if you are normal, you are probably getting a bit confused. Don't worry. Everything we have been discussing in the last few sections is summarized in Figures 15-1 and 15-2 (later in the chapter). My experience is that, with a bit of practice, you'll find the rules for redirection easy to remember.
Why would you want to throw away output? Occasionally, you will run a program because it performs a specific action, but you don't really care about the output. Other times, you might want to see the regular output, but you don't care about error messages. In the first case, you would throw away standard output; in the second case, you would throw away standard error. To do so, all you have to do is redirect the output and send it to a special file named /dev/null. (The name is pronounced "slash-dev-slash-null", although you will sometimes hear "dev-null".) The name /dev/null will make sense after you read about the Unix file system in Chapter 23. The important thing about /dev/null is that anything you send to it disappears forever(*). When Unix people gather, you will sometimes hear /dev/null referred to, whimsically, as the BIT BUCKET. * Footnote
Said a widower during a lull,
For example, let's say you have a program named update that reads and modifies a large number of data files. As it does its work, update displays statistics about what is happening. If you don't want to see the statistics, just redirect standard output to /dev/null: update > /dev/null Similarly, if you want to see the regular output, but not any error messages, you can redirect standard error. With the Bourne shell family (Bash, Korn shell), you would use: update 2> /dev/null With the C-Shell family (C-Shell, Tcsh) you would use: update >& /dev/null As I explained earlier, the above C-Shell command redirects both standard output and standard error, effectively throwing away all the output. You can do the same with the Bourne shell family as follows: update > /dev/null 2>&1 So what do you do if you are using a C-Shell and you want to throw away the standard error, but not the standard output? You can use a technique we discussed earlier when we talked about how to redirect standard error and standard output to different files. In that case, we ran the command in a subshell as follows: (update > output) >& errors Doing so allowed us to separate the two output streams. Using the same construction, we can throw away standard error by redirecting it to the /dev/null. At the same time, we can preserve the standard output by redirecting it to /dev/tty: (update > /dev/tty) >& /dev/null The special file /dev/tty represents the terminal. We'll discuss the details in Chapter 23. For now, all you need to know is that, when you send output to /dev/tty, it goes to the monitor. In this way, we can make the C-Shell and Tcsh send standard output to the monitor while throwing away standard error(*). * Footnote If you are thinking, "We shouldn't have to go to such trouble to do something so simple," you are right. This is certainly a failing of the C-Shell family. Still, it's cool that we can do it.
Redirecting standard input, standard output, and standard error is straightforward. The variations, however, can be confusing. Still, my goal is that you should become familiar with all the variations — for both shell families — which will take a bit of practice. To make it easier, I can help you in two ways. First, for reference, Figures 15-1 and 15-2 contain summaries of all the redirection metacharacters. Figure 15-1 is for the Bourne shell family; Figure 15-2 is for the C-Shell family. Within these summaries you will see all the features we have covered so far. You will also see a reference to piping. This refers to using the output of one program as the input to another program, which we discuss in the next section. The second bit of help I have for you is in the form of an example you can use to experiment. In order to experiment with standard output and standard error, you will need a simple command that generates both regular output as well as an error message. The best such command I have found is a variation of ls. The ls (list) command displays information about files, and we will meet it formally in Chapter 24. With the -l (long) option, ls displays information about files. The idea is to use ls -l to display information about two files, a and b. File a will exist, but file b will not. Thus, we will see two types of output: standard output will display information about file a; standard error will display an error message saying that file b does not exist. You can then use this sample command to practice redirecting standard output and standard error. Before we can start, we must create file a. To do that, we use the touch command. We'll talk about touch in Chapter 25. For now, all you need to know is that if you use touch with a file that does not exist, it will create an empty file with that name. Thus, if a file named a does not exist, you can create one by using: touch a We can now use ls to display information about both a (which exists) and b (which doesn't exist): ls -l a b Here is some typical output:
b: No such file or directory
The first line is standard error. It consists of an error message telling us that file b does not exist. The second line is standard output. It contains the information about file a. (Notice that the file name is at the end of the line.) Don't worry about the details. We'll talk about them in Chapter 24. We are now ready to use our sample command to experiment. Take a look at Figures 15-1 and 15-2, and choose something to practice. As an example, let's redirect standard output to a file named output: ls -l a b > output When you run this command, you will not see standard output, as it has been sent to the file output. However, you will see standard error: b: No such file or directory To check the contents of output, use the cat command. (We'll talk about cat in Chapter 16.) cat output In this case, cat will display the contents of output, the standard output from the previous command: -rw------- 1 harley staff 0 Jun 17 13:42 a Here is one more example. You are using Bash and you want to practice redirecting standard output and standard error to two different files: ls -l a b > output 2> errors Since all the output was redirected, you won't see anything on your screen. To check standard output, use: cat output To check standard error, use: cat errors As you are experimenting, you can delete a file by using the rm (remove) command. For example, to delete the files output and errors, use: rm output errors When you are finished experimenting, you can delete the file a by using: rm a Now that you have a good sample command (ls -l a b) and you know how to display the contents of a short file (cat filename), it's time to practice. My suggestion is to create at least one example for each type of output redirection in Figures 15-1 and 15-2(*). Although it will take a while to work through the list, once you finish you will know more about redirection than 9944/100 percent of the Unix users in the world. * Footnote Yes, I want you to practice with at least one shell from each of the two shell families. If you are not sure which shells to choose, use Bash and the Tcsh. If you normally use Bash, try the examples, then enter the tcsh command to start a Tcsh shell, then try the examples again. If you normally use the Tcsh, use that shell first, and then enter the bash command to start a Bash shell. Regardless of which shell you happen to use right now, you never know what the future will bring. I want you to understand the basic shell concepts — environment variables, shell variables, options, and redirection — for any shell you may be called upon to use. hint To experiment with redirection, we used a variation of the ls command:
ls -l a b > output
To make your experiments easier, you can create an alias with a simple name for this command (see Chapter 13). With a Bourne shell (Bash, Korn shell), you might use: alias x='ls -l a b' With a C-Shell (C-Shell, Tcsh): alias x 'ls -l a b' Once you have such an alias, your test commands become a lot simpler:
x > output
This is a technique worth remembering. Figure 15-1: Bourne shell family: Redirection of standard I/O
Figure 15-2: C-Shell family: Redirection of standard I/O
Earlier in the chapter, when we discussed the Unix philosophy, I explained that one goal of the early Unix developers was to build small tools, each of which would do one thing well. Their intention was that, when a user was faced with a problem that could not be solved by one tool, he or she would be able to put together a set of tools to do the job. For example, let's say you work for the government and you have three large files that contain information about all the smart people in the country. Within each file, there is one line of information per person, including that person's name. Your problem is to find out how many such people are named Harley. If you were to give this problem to an experienced Unix person, he would know exactly what to do. First, he would use the cat (catenate) command to combine the files. Then he would use the grep command to extract all the lines that contain the word Harley. Finally, he will use the wc (word count) command with the -l (line count) option, to count the number of lines. Let's take a look at how we might put together such a solution based on what we have discussed so far. We will use redirection to store the intermediate results in temporary files, which we delete when the work is done. Skipping lightly over the details of how these commands work (we will discuss them later in the book), here are the commands to do the job. To help you understand what is happening, I have added a few comments:
cat file1 file2 file3 > tempfile1 # combine files
Take a look at this carefully. Before we move on, make sure that you understand how, by redirecting standard output and standard input, we are able to pass data from one program to another by saving it in temporary files. The sequence of commands we used above will work just fine. However, they have one drawback: the glue that holds everything together — redirection using temporary files — makes the solution difficult to understand. Moreover, the more complex you get, the easier it is to make a mistake. In order to make such solutions simpler, the shell allows you to create a sequence of commands such that the standard output from one program is sent automatically to the standard input of the next program. When you do so, the connection between two programs is called a PIPE, and the sequence itself is called a PIPELINE. To create a pipeline, you type the commands you want to use separated by the | (vertical bar) character (the pipe symbol). As an example, the previous set of four commands can be replaced by a single pipeline: cat file1 file2 file3 | grep Harley | wc -l To understand a pipeline, you read the command line from left to right. Each time you see a pipe symbol, you imagine the standard output of one program becoming the standard input of the next program. The reason pipelines are so simple is that the shell takes care of all the details, so you don't have to use temporary files. In our example, the shell automatically connects the standard output of cat to the standard input of grep, and the standard output of grep to the standard input of wc. With the Bourne shell family, you can combine standard output and standard error and send them both to another program. The syntax is: command1 2>&1 | command2 where command1 and command2 are commands. In the following example, both standard output and standard error of the ls command are sent to the sort command: ls -l file1 file2 2>&1 | sort With the C-Shell family, the syntax is:
command1 |& command2
For example:
ls -l file1 file2 |& sort
When we talk about pipelines, we often use the
word PIPE as a verb, to refer to the sending
of data from one program to another. For
instance, in the first example, we piped the
output of cat to grep, and we
piped the output of grep to wc.
In the second example, we piped standard
output and standard error of ls to
sort.
When you think about an example such as the
ones above, it's easy to imagine an image of a
pipeline: data goes in one end and comes out
the other end. However, a better metaphor is
to think of an assembly line. The raw data
goes in at one end. It is then processed by
one program after another until it emerges, in
finished form, at the other end.
When you create a pipeline, you must use
programs that are written to read text from
standard input and write text to standard
output. We call such programs "filters", and
there are many of them. We will talk about
the most important filters in
Chapters 16-19. If you are a programmer, you can
create your own tools by writing filters of your own.
In practice, you will find that most of your
pipelines use only two or three commands in a
row. By far, the most common use for a
pipeline is to pipe the output of some
command to less (see Chapter 21),
in order to display the output of the command
one screenful at a time. For example, to
display a calendar for 2008, you can use:
cal 2008 | less
(The cal program is explained in
Chapter 8.)
One of the basic skills in mastering the art
of Unix is learning when and how to solve a
problem by combining programs into a pipeline.
When you create a pipeline, you can use as
many filters as you need, and you will
sometimes see pipelines consisting of five or
six or more programs put together in an
ingenious manner. Indeed, when it comes to
constructing pipelines, you are limited only
by your intelligence and your knowledge of
filters(*).
* Footnote
This should give no cause for concern. After
you read Chapters 16-19, you will understand
how to use the most important filters.
Moreover, as one of my readers, you are
obviously of above average intelligence.
hint
When you use a command that uses a pipe or
that redirects standard I/O, it is not
necessary to put spaces around the <,
> or | characters. However, it
is a good idea to use such spaces. For
example, instead of:
ls -l a b >output 2>errors
It is better to use:
ls -l a b > output 2> errors
Using spaces in this way minimizes the chances
of a typing error and makes your commands
easier to understand. This is especially
important when you are writing shell scripts.
There may be times when you want the output of
a program to go to two places at once. For
example, you may want to send output to both a
file and to another program at the same time.
To show you what I mean, consider the
following example:
cat names1 names2 names3 | grep Harley
The purpose of this pipeline is to display all
the lines in the files names1,
names2 and names3 that contain the
word "Harley". (The details: cat
combines the three files; grep extracts
all the lines that contain the characters
"Harley". These two commands are discussed in
Chapters 16 and 19 respectively.)
Let's say you want to save a copy of the
combined files. In other words, you want to
send the output of cat to a file
and you want to send it to grep at
the same time.
To do so, you use the tee command. The
purpose of tee is to read data from
standard input and send a copy of it to both
standard output and to a file. The syntax is:
tee [-a] file...
where file is the name of the file
where you want to send the data.
Normally, you would use tee with a
single file name, for example:
cat names1 names2 names3 | tee masterlist | grep Harley
In this example, the output of cat is
saved in a file called masterlist. At
the same time, the output is also piped to
grep.
When you use tee, you can save more
than one copy of the output by specifying more
than one file name. For example, in the
following pipeline, tee copies the
output of cat to two files, d1
and d2:
cat names1 names2 names3 | tee d1 d2 | grep Harley
If the file you name in a tee command
does not exist, tee will create it for
you. However, you must be careful, because if
the file already exists, tee will
overwrite it and the original contents will be
lost.
If you want tee to append data to the
end of a file instead of replacing the file,
use the -a (append) option. For
example:
cat names1 names2 names3 | tee -a backup | grep Harley
This command saves the output of cat to
a file named backup. If backup
already exists, nothing will be lost because
the output will be appended to the end of the
file.
The tee command is especially handy at
the end of a pipeline when you want to look at
the output of a command and save it to
a file at the same time. For example, let's
say you want to use the who command
(Chapter 8) to display information about
the userids that are currently logged in to
your system. However, you not only want to
display the information, you also want to save
it to a file status. One way to do the
job is by using two separate commands:
who
However, by using tee, you can do it
all at once:
who | tee status
Pay particular attention to this pattern: I
want you to remember it:
command | tee file
Notice that you don't have to use another
program after tee. This is because
tee sends its output to standard output
which, by default, is the screen.
In our example, tee reads the output of
who from standard input and writes it
to both the file status and to the
screen. If you find that the output is too
long, you can pipe it to less to
display it one screenful at a time:
who | tee status | less
What's in a Name?
tee
In the world of plumbing, a "tee" connector
joins two pipes in a straight line, while
providing for an additional outlet that
diverts water at a right angle. For example,
you can use a tee to allow water to flow from
left to right, as well as downwards. The
actual connector looks like an uppercase "T".
When you use the Unix tee command, you
can imagine data flowing from left to right as
it moves from one program to another. At the
same time, a copy of the data is sent down the
stem of the "tee" into a file.
On October 11, 1964, Doug McIlroy, a Bell Labs
researcher wrote a 10-page internal memo in
which he offered a number of suggestions and
ideas. The last page of the memo contained a
summary of his thoughts. It begins:
"To put my strongest concerns into a
nutshell:
"We should have some ways of connecting
programs like [a] garden hose — screw in
another segment when it becomes necessary to
massage data in another way..."
In retrospect, we can see that McIlroy was
saying that it should be easy to put together
programs to solve whatever problem might be at
hand. As important as the idea was, it did
not bear fruit until well over half a decade
later.
By the early 1970s, the original Unix project
was well underway at Bell Labs (see
Chapter 2). At the time, McIlroy was a
manager in the research department in which
Unix was born. He was making important
contributions to a variety of research areas,
including some aspects of Unix. For example,
it was McIlroy who demanded that Unix manual
pages be short and accurate.
McIlroy had been promoting his ideas regarding
the flow of input and output for some time.
It wasn't until 1972, however, that Ken
Thompson (see Chapter 2) finally added
pipelines to Unix. In order to add the pipe
facility, Thompson was forced to modify most
of the existing programs to change the source
of input from files to standard input.
Once this was done and a suitable notation was
devised, pipelines became an integral part of
Unix, and users became more creative than
anyone had expected. According to McIlroy,
the morning after the changes were made,
"...we had this orgy of one liners. Everybody
had a one liner. Look at this, look at
that..."
In fact, the implementation of pipelines was
the catalyst that gave rise to the Unix
philosophy. As McIlroy remembers,
"...Everybody started putting forth the Unix
philosophy. Write programs that do one thing
and do it well. Write programs to work
together. Write programs that handle text
streams, because that is a universal
interface..."
Today, well over thirty years later, the Unix
pipe facility is basically the same as it was
in 1972: a remarkable achievement. Indeed, it
is pipelines and standard I/O that, in large
part, make the Unix command line interface so
powerful. For this reason, I encourage you to
take the time to learn how to use pipelines
well and to practice integrating them into
your day-to-day work whenever you get the
chance.
To help you start your journey on the Unix
version of the yellow-brick road, I have
devoted Chapters 16-19 to filters, the raw
materials out of which you can fashion
ingenious solutions to practical problems.
Before we move on to talk about filters,
however, there is one last topic I want to
cover: conditional execution.
There will be times when you will want to
execute a command only if a previous command
has finished successfully. To do so, use the
syntax:
command1 && command2
At other times, you will want to execute a
command only if a previous command has
not finished successfully. The syntax in
this case is:
command1 || command2
This idea — executing a command only if a
previous command has succeeded or failed — is
called CONDITIONAL EXECUTION.
Conditional execution is mostly used within
shell scripts. However, from time to time, it
can come in handy when you are entering
commands. Here are some examples.
Let's say you have a file named people
that contains information about various
people. You want to sort the contents of
people and save the output to a file named
contacts. However, you only want to do
so if people contains the name "Harley"
somewhere in the file.
To start, how can we see if a file contains
the name "Harley"? We use the grep
command (see Chapter 19) to display all
the lines in the file that contain "Harley".
The command is:
grep Harley people
If grep is successful, it will display
the lines that contain "Harley" on standard
output. If grep fails, it will remain
silent. In our case, if grep is
successful, we then want to run the command:
sort people > contacts
If grep is unsuccessful, we don't want
to do anything.
Here is a command line that uses conditional
execution to do the job:
grep Harley people && sort people > contacts
Although this command line works, it leaves us
with a tiny problem. If grep finds any
lines in the file that meet our criteria, it
will display them on the screen. Most of the
time this would make sense but, in this case,
we don't really want to see any output. All
we want to do is run grep and test
whether or not it was successful.
The solution is to throw away the output of
grep by redirecting it to
/dev/null:
grep Harley people > /dev/null && sort people > contacts
Occasionally, you will want to execute a
command only if a previous command fails. For
example, suppose you want to run a program
named update that works on its own for
several minutes doing something or other. If
update finishes successfully, all is
well. If not, you would like to know about
it. The following command displays a warning
message, but only if update fails:
update || echo "The update program failed."
hint
If you ever need to abort a pipeline that is running,
just press ^C to send the intr signal
(see Chapter 7).
This is a good way to regain control when one of the
programs in the pipeline has stopped, because it is
waiting for input.
Review Question #1:
Summarize the Unix philosophy.
Review Question #2:
In Chapter 10, I gave you three questions to
ask yourself each time you learn the syntax
for a new program:
• What does the command do?
Similarly, what are the three questions you
should ask (and answer) whenever you start to
learn a new program?
Review Question #3:
Collectively, the term "standard I/O" refers
to standard input, standard output, and
standard error. Define these three terms.
What are their abbreviations?
What does it mean to redirect standard I/O?
Show how to redirect all three types of
standard I/O.
Review Question #4:
What is a pipeline?
What metacharacter do you use to separate the
components of a pipeline?
What program would you use at the end of a
pipeline to display output one screenful at a
time?
Review Question #5:
What program do you use to save a copy of data
as it passes through a pipeline?
Applying Your Knowledge #1:
Show how to redirect the standard output of
the date command to a file named
currentdate.
Applying Your Knowledge #2:
The following pipeline counts the number of
userids that are currently logged into the
system. (The wc -w command counts
words; see Chapter 18.)
users | wc -w
Without changing the output of the pipeline,
modify the command to save a copy of the
output of users to a file named
userlist.
Applying Your Knowledge #3:
The password file (/etc/passwd)
contains one line for each userid registered
with the system. Create a single pipeline to
sort the lines of the password file, save them
to a file called userids, and then
display the number of userids on the system.
Applying Your Knowledge #4:
In the following pipeline, the find
command (explained in Chapter 25) searches all
the directories under /etc looking for
files owned by userid root. The names
of all such files are then written to standard
output, one per line. The output of
find is piped to wc -l to count
the lines:
find /etc -type f -user root -print | wc -l
As find does its work, it will generate
various error messages you don't want to see.
Your goal is to rewrite the pipeline to throw
away the error messages without affecting the
rest of the output. Show how to do this for
the Bourne Shell family.
For extra credit, see if you can devise a way
to do it for the C-Shell family. (Hint: Use a
subshell within a subshell.)
For Further Thought #1:
An important part of the Unix philosophy is
that, when you need a new tool, it is better
to combine existing tools than to write new
ones. What happens when you try to apply this
guideline to GUI-based tools?
Is that good or bad?
For Further Thought #2:
With the Bourne shell family, it is simple to
redirect standard output and standard error
separately. This makes it easy to save or
discard error messages selectively. With the
C-Shell family, separating the two types of
output is much more complex. How important is
this?
The C-Shell was designed by Bill Joy, a
brilliant programmer in his day. Why do you
think he created such a complicated system?
For Further Thought #3:
As a general rule, the world of computers
changes quickly. Why do you think so many of
the basic Unix design principles work so well
even though they were created over 30 years
ago?
• In the 1970s, the Unix user community was
small, allowing developers to experiment and
to make changes quickly. Unlike today's
operating system programmers, the original
developers did not have to concern themselves
with a large installed base of unsophisticated
users or with powerful corporate interests.
• The synergistic ideas of flexibility,
cooperation, and sharing were built into the
system from the very beginning. As a result,
anyone with a good idea would have a good
chance of having his idea incorporated into
the operating system.
List of Chapters + Appendixes
© All contents Copyright 2024, Harley Hahn
|