Monday, May 18, 2009

Processes


The ps command gives you a process status. Without any options, it gives you the process status for the terminal on which you are running the command. That is, if you are logged in several times, ps will only show you the processes on that terminal and none of the others. For example, I have four sessions logged in on the system console. When I switch to one and run ps, I get

  PID TTY          TIME CMD
1762 pts/3 00:00:00 bash
1820 pts/4 00:00:00 bash
1858 pts/5 00:00:00 bash
23172 pts/4 00:00:00 ps

This shows those processes running as the same user which started the ps command.

If I want to see the processes on a particular terminal, you can use the -t option. A nice aspect of the -t option is that you don't have to specify the full device name or even the "tty" portion. You can just give the tty number. For example:

which shows:

  PID TTY          TIME CMD
1799 pts/4 00:00:00 bash
1805 pts/4 00:00:00 su
1806 pts/4 00:00:00 bash
1819 pts/4 00:00:00 su
1820 pts/4 00:00:00 bash

Note that there are a number of other processes than in the previous example. The reason is that the first example, shows us all of the processes for the current user and the second example shows us all of the processes running on that terminal. The reason that there are more processes on that terminal than the user has is that there are a couple of su command which switch users. Therefore the processes are running as a different user.

Keep in mind that if I was on a pseudo-terminal, the terminal number also includes the pts/ and not just the number. The pts shows us that we are running on a "pseudo" terminal that does not physically exist.. If I have a console or serial terminal, then the pts isn't used because it isn't part of the tty name. For example, if I wanted to check processes on tty4, I would enter the following:

Note also that on older system you did not specify the /dev portion of the device name, even if you specify the tty portion. For example, this works

but this doesn't work:

Also be careful with spaces after the -t. Some versions require that there be no spaces. New versions of ps accept, both the older style single-letter option as well as the longer, GNU-style options. Typically I use ps aux to show me all of the processes on the system and grep for a specific process. (Assuming I know which one I am looking for).

If you are curious about what a particular user is running, use the -u option to find out every process that user owns.

Although running ps like this shows who is running what, it tells you little about the behavior of the process itself. In the section on processes, I showed you the -l option, which shows you much more information. If I add the -l (long) option, I might get output that looks like this:

  F S   UID   PID  PPID  C PRI  NI ADDR SZ WCHAN  TTY          TIME CMD
100 S 0 1258 1 0 69 0 - 324 read_c tty1 00:00:00 mingetty
100 S 0 1259 1 0 69 0 - 523 wait4 tty2 00:00:00 login
100 S 0 1260 1 0 69 0 - 324 read_c tty3 00:00:00 mingetty
100 S 0 1261 1 0 69 0 - 324 read_c tty4 00:00:00 mingetty
100 S 0 1262 1 0 69 0 - 324 read_c tty5 00:00:00 mingetty
100 S 0 1268 1 0 69 0 - 324 read_c tty6 00:00:00 mingetty
100 S 0 1730 1682 0 69 0 - 550 wait4 pts/3 00:00:00 su
000 S 0 1731 1730 0 69 0 - 684 wait4 pts/3 00:00:00 bash
100 S 0 1805 1799 0 69 0 - 550 wait4 pts/4 00:00:00 su
000 S 0 1806 1805 0 69 0 - 682 wait4 pts/4 00:00:00 bash
100 S 0 1843 1838 0 69 0 - 550 wait4 pts/5 00:00:00 su
000 S 0 1844 1843 0 69 0 - 682 wait4 pts/5 00:00:00 bash
100 S 0 2100 1858 0 69 0 - 550 wait4 pts/5 00:00:00 su
000 S 0 2101 2100 0 69 0 - 684 read_c pts/5 00:00:00 bash
100 S 0 6779 3357 0 69 0 - 550 wait4 pts/7 00:00:00 su
000 S 0 6780 6779 0 72 0 - 694 wait4 pts/7 00:00:00 bash
100 R 0 23217 6780 0 78 0 - 797 - pts/7 00:00:00 ps

When problems arise, I quite often want to see which process is running the longest, so I use the TIME column, which tells me the total time that this process has been running. Note that the time for all bash processes is zero seconds, though I actually logged into many of these sessions for several hours. The reason for this is because the shell spends most of its time waiting either for you to input something or for the command that you entered to finish. Nothing out of the ordinary here. Because I am the only one on the system at the moment, this value is actually low. A database that is running might have times that are in hours.

On one web server I was on, there was a perl script started from a web page that had averaged about 85% of the CPU time over the last 10 days. This is typically not normal, since pages are usually loaded and then process stops.

Unless I knew specifically on which terminal a problem existed, I would probably have to show every process to get something of value. This would be done with the -e option (for "everything"). The problem is that I have to look at every single line to see what the total time is. So, to make my life easier, I can | .', CAPTION, 'pipe',TEXTSIZE, '3', CAPTIONSIZE, '3', FGCOLOR, '#DEDED4', BGCOLOR, '#D8D8C4', CAPCOLOR, 'BLACK', STATUS, 'click me for more details');" onmouseout="nd();" target="glossary">pipe it to sort.

Figuring out what is a reasonable value is not always easy. The most effective method I have found is to monitor these values while they are behaving "correctly." You then have a rough estimate of the amount of time particular processes need, and you can quickly see when something is out of the ordinary.

Something else that I use regularly is the PID-PPID pair that occurs when I use the -l option. If I come across a process that doesn't look right, I can follow the PID-to-PPID chain until I find a process with a PPID of 1. Because process 1 is init, I know that this process is the starting point. Knowing this is often useful when I have to kill a process. Sometimes, the process is in an "unkillable" state, which happens in two cases. First, the process may be making the transition to becoming defunct, in which case I can ignore it. Second, it be stuck in some part of the code in kernel mode, in which case it won't receive my kill signal. In such cases, I have found it useful to kill one of its ancestors (such as a shell). The hung process is inherited by init and will eventually disappear. However, in the meantime, the user can get back to work. Afterward comes the task of figuring out what the problem was.

However, you don't need to follow the PID-PPID chain yourself. By using -f, you can get ps to print the output in "forest" modes, whereby the family tree of each process is shown. This might look like this:

# ps au --forest
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
jimmo 3357 0.0 0.0 2796 4 pts/7 S Mar27 0:00 /bin/bash
root 6779 0.0 0.0 2200 4 pts/7 S Mar27 0:00 su
root 6780 0.0 0.2 2776 872 pts/7 S Mar27 0:00 \_ bash
jimmo 1838 0.0 0.0 2780 4 pts/5 S Mar27 0:00 /bin/bash
root 1843 0.0 0.0 2200 4 pts/5 S Mar27 0:00 su -
root 1844 0.0 0.0 2728 4 pts/5 S Mar27 0:00 \_ -bash
linux-tu 1857 0.0 0.0 2148 4 pts/5 S Mar27 0:00 \_ su - linux-tu
linux-tu 1858 0.0 0.0 2732 4 pts/5 S Mar27 0:00 \_ -bash
root 2100 0.0 0.0 2200 4 pts/5 S Mar27 0:00 \_ su -
root 2101 0.0 0.2 2740 876 pts/5 S Mar27 0:00 \_ -bash
root 23470 0.0 0.4 2764 1744 pts/5 R 14:41 0:00 \_ ps au --forest
jimmo 1799 0.0 0.0 2780 4 pts/4 S Mar27 0:00 /bin/bash
root 1805 0.0 0.0 2200 4 pts/4 S Mar27 0:00 su -
root 1806 0.0 0.0 2728 4 pts/4 S Mar27 0:00 \_ -bash
linux-tu 1819 0.0 0.0 2148 4 pts/4 S Mar27 0:00 \_ su -l linux-tu
linux-tu 1820 0.0 0.1 2736 600 pts/4 S Mar27 0:00 \_ -bash
jimmo 1682 0.0 0.0 2784 4 pts/3 S Mar27 0:00 /bin/bash
root 1730 0.0 0.0 2200 4 pts/3 S Mar27 0:00 su -
root 1731 0.0 0.0 2736 4 pts/3 S Mar27 0:00 \_ -bash
linux-tu 1761 0.0 0.0 2148 4 pts/3 S Mar27 0:00 \_ su - linux-tu
linux-tu 1762 0.0 0.1 2740 760 pts/3 S Mar27 0:00 \_ -bash
jimmo 1681 0.0 0.0 1536 60 pts/2 S Mar27 0:00 tail -f /var/log/httpd/error_log
jimmo 1482 0.0 0.0 1524 4 pts/0 S Mar27 0:00 /bin/cat
root 1268 0.0 0.0 1296 4 tty6 S Mar27 0:00 /sbin/mingetty tty6
root 1262 0.0 0.0 1296 4 tty5 S Mar27 0:00 /sbin/mingetty tty5
root 1261 0.0 0.0 1296 4 tty4 S Mar27 0:00 /sbin/mingetty tty4
root 1260 0.0 0.0 1296 4 tty3 S Mar27 0:00 /sbin/mingetty tty3
root 1259 0.0 0.3 2092 1208 tty2 S Mar27 0:00 login -- jimmo
jimmo 23199 0.0 0.3 2708 1504 tty2 S 14:00 0:00 -bash
root 1258 0.0 0.0 1296 4 tty1 S Mar27 0:00 /sbin/mingetty --noclear tty1

Here you see a number of different sets of processes that are related. In a couple of cases, one did an su to root (su -) and then to another user. In one of these cases, the ps au --forest command that make this output can been seen.

One nice thing is that ps has its own sort options, so you don't need to | .', CAPTION, 'pipe',TEXTSIZE, '3', CAPTIONSIZE, '3', FGCOLOR, '#DEDED4', BGCOLOR, '#D8D8C4', CAPCOLOR, 'BLACK', STATUS, 'click me for more details');" onmouseout="nd();" target="glossary">pipe it through sort. Also, you can select which information you want displayed and even on which order.

Another very useful tool is top. Without any option it shows you something like the following image.

In the upper portion of the screen is some general system information about how many users are on the system, how long the system has been running, the number of process, memory usage and so forth. In the lower portion you see a process list. This is sorted so that the most active processes are at the top of the list. One advantage of top is that it constantly refreshes, so you get update information every five seconds (this can be changed, if you want).

As you might expect, there are many different configuration options to choose from. You can even configure top, so that you cannot interrupt (it keeps on running). That way you can start it on a terminal to give you constantly updated information about your system without the danger of someone breaking out of it an having access to a shell.

The free command will show you how much memory is being used, as well as how much memory is used for buffers, cache, and the status of your swap space. By default, the displa yis kilobytes, but you can change this by using the -b option which will dispaly the memory usage in bytes or -m for megabytes. Normally, free will display the statistics and then exit. By using teh -s followed by a number of seconds. Note that you do not need to use whole seconds, as free using the usleep, so you can specify any floating-point value.

By default the display looks like this:

             total       used       free     shared    buffers     cached
Mem: 514712 485200 29512 0 19312 179116
-/+ buffers/cache: 286772 227940
Swap: 1574328 213468 1360860

The uptime shows you how long the system has been up, how many users are on the system and the load average over the last 1, 5 and 15 minutes. Note that the load average is not always easy to understand. This does not represent a percentage of CPU usage, but rather the sum of the run queue length and the number of jobs currently running on the CPUs. On the one hand that fact that a having many jobs running (and thus a high load average) does not necessarily mean that the system is having problems. However, it is possible that a single, CPU-bound process can bring your system to a halt, even though the load average is low. By default, the output looks like this:

  5:14pm  up 3 days  6:54,  18 users,  load average: 0.08, 0.18, 0.17

Similar to the ps command is the w command. However, w does provide show many details as this example shows:

 17:23:35 up 3 days,  7:03, 18 users,  load average: 0.13, 0.19, 0.18
USER TTY LOGIN@ IDLE JCPU PCPU WHAT
jimmo tty2 09:34 7:26m 0.04s 0.04s -bash
jimmo :0 Thu10 ?xdm? 2:21m 0.02s -:0
jimmo pts/0 Thu10 3days 0.00s 0.87s kdeinit: kwrited
jimmo pts/4 Sat08 24:23m 0.04s 0.04s /bin/bash
jimmo pts/5 Thu10 9:21m 0.21s 0.21s /bin/bash
jimmo pts/6 Sat08 30:02m 0.20s 0.06s man cat
jimmo pts/11 Thu10 7:11m 0.06s 0.06s /bin/bash
jimmo pts/13 Thu10 2:00m 0.14s 0.14s /bin/bash
jimmo pts/14 Thu10 0.00s 0.12s 0.02s w
jimmo pts/15 Sat09 32:08m 0.03s 0.03s /bin/bash
jimmo pts/16 14:33 2:49m 0.04s 0.04s /bin/bash

This gives you a quick and dirty overview of what user processes are running on the system. Note that you have a list of processes on all terminals, but none of the system processes. When using the -f options, w will show you the from field, which is the name of the remote most (if applicable).

No comments:

Post a Comment