Chapter 19
Searching for Information and Files
One of the greatest challenges in UNIX is to find the files you want, when you want them. Even the best organization in the world, with mnemonic subdirectories and carefully named files, can break down and leave you saying to yourself, "I know it's somewhere, and I remember that it contains a bid for Acme Acres Construction to get that contract; but for the life of me, I just can't remember where it is!"

Goals for This Hour

In this hour, you learn about: In this hour, you learn sophisticated ways to find specific information on the UNIX system. The powerful find command and its partner, xargs, are the contents of this hour.

Finding What's Where

The more you use UNIX, the more likely you'll end up losing track of where some of your files are. In UNIX, however, there's a cool tool to help you find them again.

Task 19.1: The find Command and Its Weird Options

Description: The grep family can help you find files by their content. There are a lot of other ways to look for things in UNIX, and that's where the find command can help. This command has a notation that is completely different from all other UNIX commands: it has full-word options rather than single-letter options. Instead of -n pattern to match filenames, for example, find uses -name pattern.

The general format for this command is to specify the starting point for a search through the file system, followed by any actions desired. The list of possible options, or flags, is shown in Table 19.1.

Option Meaning
-atime nTrue if file was accessed n days ago.
-ctime nTrue if the file was created n days ago.
-exec commandExecute command.
-mtime nTrue if file was modified n days ago.
-name patternTrue if filename matches pattern.
-printPrint names of files found.
-type cTrue if file is of type c (as shown in Table 19.2).
-user nameTrue if file is owned by user name.

Table 19.1. Useful options for the find command.

The find command checks the specified options, going from left to right, once for each file or directory encountered. Further, find with any of the time-oriented commands can search for files more recent than, older than, or exactly the same age as a specified date, with the specifications -n, +n, and n, respectively. Some examples will make this clear.

Action!

  1. At its simplest, find can be used to create a list of all files and directories below the current directory:
    % find . -print
    .
    ./OWL
    ./OWL/owl.h
    ./OWL/owl
    ./OWL/owl.c
    ./OWL/simple.editor.c
    ./OWL/ask.c
    ./OWL/simple.editor.o
    ./OWL/owl.o
    
    
    lots and lots of output removed ./dead.letter ./who.is.who ./src.listing ./tmp.listing ./.wrongwords ./papert.article

  2. To limit the output to just those files that are C source files (those that have a .c suffix), I can use the -name option before the -print option:
    % find . -name "*.c" -print
    ./OWL/owl.c
    ./OWL/simple.editor.c
    ./OWL/ask.c
    ./OWL/handout.c
    ./OWL/WordMap/msw_to_txt.c
    ./OWL/WordMap/newtest.c
    ./OWL/feedback.c
    ./OWL/define.c
    ./OWL/spell.c
    ./OWL/submit.c
    ./OWL/utils.c
    ./OWL/parse.c
    ./OWL/sendmail.c
    ./owl.c
    ./src/calc.c
    ./src/info.c
    ./src/fixit.c
    ./src/massage.c
    
    Using the -name option before the -print option can be very handy.

  3. To find just those files that have been modified in the last seven days, I can use -mtime with the argument -7 (include the hyphen):
    % find . -mtime -7 -name "*.c" -print
    ./OWL/owl.c
    ./OWL/simple.editor.c
    ./OWL/ask.c
    ./OWL/utils.c
    ./OWL/sendmail.c
    
    If I just use the number 7 (without a hyphen), I will match only those files that were modified exactly seven days ago:
    % find . -mtime 7 -name "*.c" -print
    %
    
    To find those C source files that I haven't touched for at least 30 days, I use +30:
    % find . -mtime +30 -name "*.c" -print
    ./OWL/WordMap/msw_to_txt.c
    ./OWL/WordMap/newtest.c
    ./src/calc.c
    ./src/info.c
    ./src/fixit.c
    ./src/massage.c
    

  4. With find, I now have a tool for looking across vast portions of the file system for specific file types, filenames, and so on.

    To look across the /bin and /usr directory trees for filenames that contain the pattern cp, I can use the following command:

    % find /bin /usr -name "*cp*" -print
    /usr/diag/sysdcp
    /usr/spool/news/alt/bbs/pcbuucp
    /usr/spool/news/alt/sys/amiga/uucp
    /usr/spool/news/comp/mail/uucp
    /usr/spool/news/comp/os/cpm
    /usr/spool/news/comp/protocols/tcp-ip
    /usr/man/man4/tcp.4p
    /usr/man/man8/tcpd.8l
    /usr/man/cat3f/%unitcp.3f.Z
    /usr/man/cat3f/unitcp.3f.Z
    /usr/unsup/bin/cpio
    /usr/unsup/gnu/man/man1/cccp.1
    /usr/news/cpulimits
    /usr/doc/local/form/cp
    /usr/doc/local/form/cpio
    /usr/doc/local/form/rcp
    /usr/doc/uucp
    

    Note: This type of search can take a long time on a busy system. When I ran this command on my system, it took almost an hour to complete!

  5. To find a list of the directories I've created in my home directory, I can use the -type specifier with one of the values shown in Table 19.2. Here's one example:
    % find . -type d -print
    .
    ./OWL
    ./OWL/Doc
    ./OWL/WordMap
    ./.elm
    ./Archives
    ./InfoWorld
    ./InfoWorld/PIMS
    ./Mail
    ./News
    ./bin
    ./src
    ./temp
    %
    

    Letter Meaning
    dDirectory
    fFile
    lLink

    Table 19.2. Helpful find -type file types.

  6. To find more information about each of these directories, I can use the -exec option to find. Unfortunately, I cannot simply enter the command: the exec option must be used with {}, which will be replaced by the matched filename, and \; at the end of the command. (If the \ is left out, the C shell will interpret the ; as the end of the find command.) You also must ensure that there is a space between the {} and the \;.
    % find . -type d -exec ls -ld {} \;
    drwx------ 11 taylor       1024 Dec 10 14:13 .
    drwx------  4 taylor        532 Dec  6 18:31 ./OWL
    drwxrwx---  2 taylor        512 Dec  2 21:18 ./OWL/Doc
    drwxrwx---  2 taylor        512 Nov  7 11:52 ./OWL/WordMap
    drwx------  2 taylor        512 Dec 10 13:30 ./.elm
    drwx------  2 taylor        512 Nov 21 10:39 ./Archives
    drwx------  3 taylor        512 Dec  3 02:03 ./InfoWorld
    drwx------  2 taylor        512 Sep 30 10:38 ./InfoWorld/PIMS
    drwx------  2 taylor       1024 Dec  9 11:42 ./Mail
    drwx------  2 taylor        512 Oct  6 09:36 ./News
    drwx------  2 taylor        512 Dec 10 13:58 ./bin
    drwx------  2 taylor        512 Oct 13 10:45 ./src
    drwxrwx---  2 taylor        512 Nov  8 22:20 ./temp
    

  7. The find command is commonly used to remove core files that are more than a few days old. These core files are copies of the actual memory image of a running program when the program dies unexpectedly. They can be huge, so occasionally trimming them is wise:
    % find . -name core -ctime +4 -exec /bin/rm -f {} \;
    %
    
    There's no output from this command because I didn't use the - print at the end of the command. What it does is find all files called "core" that have a creation time that's more than 4 days ago and remove them.
Summary

The find command is a powerful command in UNIX. It helps you find files by owner, type, filename, and other attributes. The most awkward part of the command is the required elements of the -exec option, and that's where the xargs command helps immensely.


Task 19.2: Using find with xargs

Description: You can use find to search for files, and you can use grep to search within files, but what if you want to search a combination? That's where xargs is helpful.

Action

  1. A few days ago, I was working on a file that was computing character mappings of files. I'd like to find it again, but I don't remember either the filename or where the file is located.

    First off, what happens if I use find and have the -exec argument call grep to find files containing a specific pattern?

    % find . -type f -exec grep -i mapping {} \;
    typedef struct mappings {
    map_entry character_mapping[] = {
    int         long_mappings = FALSE;
              case 'l': long_mappings = TRUE;
                if (long_mappings)
            /** do a short mapping **/
            /** do a long mapping **/
            /** Look up the specified character in the mapping database **/
            while ((character_mapping[pointer].key < ch) &&
                   (character_mapping[pointer].key > 0))
            if (character_mapping[pointer].key == ch)
              return ( (map_entry *) &character_mapping[pointer]);
    # map,uucp-map    = The UUCP Mapping Project = nca-maps@apple.com
    grep -i "character*mapping" * */* */*/*
    to print PostScript files produced by a mapping application that
    [ccc]runs on the
    bionet.genome.chromosomes       Mapping and sequencing of 
    [ccc]eucaryote chromosomes.
    ./bin/my.new.cmd: Permission denied
    typedef struct mappings {
    map_entry character_mapping[] = {
    int         long_mappings = FALSE;
              case 'l': long_mappings = TRUE;
                if (long_mappings)
            /** do a short mapping **/
            /** do a long mapping **/
            /** Look up the specified character in the mapping database **/
            while ((character_mapping[pointer].key < ch) &&
                   (character_mapping[pointer].key > 0))
            if (character_mapping[pointer].key == ch)
              return ( (map_entry *) &character_mapping[pointer]);
    or lower case values. The table mapping upper to
    
    The output is interesting, but it doesn't contain any filenames!

  2. A second, smarter strategy would be to use the -l flag to grep so that grep specifies only the matched filename:
    % find . -type f -exec grep -l -i mapping {} \;
    ./OWL/WordMap/msw_to_txt.c
    ./.elm/aliases.text
    ./Mail/mark
    ./News/usenet.alt
    ./bin/my.new.cmd: Permission denied
    ./src/fixit.c
    ./temp/attach.msg
    

  3. That's a step in the right direction, but the problem with this approach is that each time find matches a file, it invokes grep, which is a very resource-intensive strategy. Instead, you use the xargs to read the output of find and build calls to grep (remember that each time a file is seen, the grep program will check through it) that specify a lot of files at once. This way, grep is called only four or five times even though it might check through 200 or 300 files. By default, xargs always tacks the list of filenames to the end of the specified command, so using it is as easy as can be:
    % find . -type f -print | xargs grep -l -i mapping
    ./OWL/WordMap/msw_to_txt.c
    ./.elm/aliases.text
    ./Mail/mark
    ./News/usenet.alt
    ./bin/my.new.cmd: Permission denied
    ./src/fixit.c
    ./temp/attach.msg
    
    This gave the same output, but it was a lot faster.

  4. What's nice about this approach to working with find is that because grep is getting multiple filenames, it will automatically include the filename of any file that contains a match when grep shows the matching line. Removing the -l flag results in exactly what I want:
    % ^-l^
    find . -type f -print | xargs grep -i mapping
    ./OWL/WordMap/msw_to_txt.c:typedef struct mappings {
    ./OWL/WordMap/msw_to_txt.c:map_entry character_mapping[] = {
    ./OWL/WordMap/msw_to_txt.c:int         long_mappings = FALSE;
    ./OWL/WordMap/msw_to_txt.c:       case 'l': long_mappings = TRUE;
    ./OWL/WordMap/msw_to_txt.c:         if (long_mappings)
    ./OWL/WordMap/msw_to_txt.c:     /** do a short mapping **/
    ./OWL/WordMap/msw_to_txt.c:     /** do a long mapping **/
    ./OWL/WordMap/msw_to_txt.c:     /** Look up the specified character in
    [ccc]the mapping database **/
    ./OWL/WordMap/msw_to_txt.c:     while ((character_mapping[pointer].key 
    ./src/fixit.c:  /** do a long mapping **/
    ./src/fixit.c:  /** Look up the specified character in the
    [ccc]mapping database **/
    ./src/fixit.c:  while ((character_mapping[pointer].key < ch) &&
    ./src/fixit.c:         (character_mapping[pointer].key > 0))
    ./src/fixit.c:  if (character_mapping[pointer].key == ch)
    ./src/fixit.c:    return ( (map_entry *) &character_mapping[pointer]);
    ./temp/attach.msg:or lower case values. The table mapping upper to
    
Summary

When used in combination, find, grep, and xargs are a potent team to help find files lost or misplaced anywhere in the UNIX file system. I encourage you to experiment further with these important commands to find ways they can help you work with UNIX.


Summary

The find command is one of the more potent commands in UNIX. It has a lot of esoteric options, and to get the full power out of find, xargs, and grep, you need to experiment.

Workshop

This Workshop poses some questions about the topics presented in this chapter. It also provides you with a preview of what you will learn in the next hour.

Questions

  1. Use find and wc -l to count how many files you have. Be sure to include the -type f option so that you don't include directories in the count.

  2. Use the necessary commands to list the following:
    • ALL filenames that contain abc
    • ALL files that contain abc

Preview of the Next Hour

The next hour introduces you to the various UNIX tools available to communicate with other users, whether interactively character-by-character or with one of the many e-mail utilities.