Monday, May 18, 2009

The Boot Process


The process of turning on your computer and having it jump through hoops to bring up the operating system is called booting, which derives from the term bootstrapping. This is an allusion to the idea that a computer pulls itself up by its bootstraps, in that smaller pieces of simple code start larger, more complex pieces to get the system running.

The process a computer goes through is similar among different computer types, whether it is a PC, Macintosh, or SPARC Workstation. In the next section, I will be talking specifically about the PC, though the concepts are still valid for other machines.

The first thing that happens is the Power-On Self-Test (POST). Here the hardware checks itself to see that things are all right. It compares the hardware settings in the CMOS (Complementary Metal Oxide Semiconductor) to what is physically on the system. Some errors, like the floppy types not matching, are annoying, but your system still can boot. Others, like the lack of a video card, can keep the boot process from continuing. Often, there is nothing to indicate what the problem is, except for a few little "beeps."

Once the POST is completed, the hardware jumps to a specific, predefined location in RAM. The instructions located here are relatively simple and basically tell the hardware to go look for a boot device. Depending on how your CMOS is configured, the hardware first checks your floppy and then your hard disk.

When a boot device is found (let's assume that it's a hard disk), the hardware is told to go to the 0th (first) sector (cylinder 0, head 0, sector 0), then load and execute the instructions there. This is the master boot record, or MBR for you DOS-heads (sometimes also called the master boot block.) This code is small enough to fit into one block but is intelligent enough to read the partition table (located just past the master boot block) and find the active partition. Once it finds the active partition, it begins to read and execute the instructions contained within the first block.

It is at this point that viruses can affect/infect Linux systems. The master boot block has the same format for essentially all PC-based operating systems and it does is find and execute code at the beginning of the active partition. But if the master boot block contains code that tells it to go to the very last sector of the hard disk and execute the code there, which then tells the system to execute code at the beginning of the active partition, you would never know anything was wrong.

Let's assume that the instructions at the very end of the disk are larger than a single 512-byte sector. If the instructions took up a couple of kilobytes, you could get some fairly complicated code. Because it is at the end of the disk, you would probably never know it was there. What if that code checked the date in the CMOS and, if the day of the week was Friday and the day of the month was 13, it would erase the first few kilobytes of your hard disk? If that were the case, then your system would be infected with the Friday the 13th virus, and you could no longer boot your hard disk.

Viruses that behave in this way are called "boot viruses," as they affect the master boot block and can only damage your system if this is the disk from which you are booting. These kinds of viruses can affect all PC-based systems. Some computers will allow you to configure them (more on that later) so that you cannot write to the master boot block. Although this is a good safeguard against older viruses, the newer ones can change the CMOS to allow writing to the master boot block. So, just because you have enabled this feature does not mean your system is safe. However, I must point out that boot viruses can only affect Linux systems if you boot from an infected disk. This usually will be a floppy, more than likely a DOS floppy. Therefore, you need to be especially careful when booting from floppies.

Now back to our story...

As I mentioned, the code in the master boot block finds the active partition and begins executing the code there. On an MS-DOS system, these are the IO.SYS and MSDOS.SYS files. On an Linux system, this is often the LILO or Linux loader "program." Although IO.SYS and MSDOS.SYS are "real" files that you can look at and even remove if you want to, the LILO program is not. The LILO program is part of the partition, but not part of the file system; therefore, it is not a "real" file. Regardless of what program is booting your system and loading the kernel, it is generally referred to as a "boot loader".

Often, LILO is installed in the master boot block of the hard disk itself. Therefore, it will be the first code to run when your system is booted. In this case, LILO can be used to start other operating systems. On one machine, I have LILO start either Windows 95 or one of two different versions of Linux.

In other cases, LILO is installed in the boot sector of a given partition. In this case, it is referred to as a "secondary" boot loader and is used just to load the Linux installed on that partition. This is useful if you have another operating system such as OS/2 or Windows NT and you use the boot software from that OS to load any others. However, neither of these was designed with Linux in mind. Therefore, I usually have LILO loaded in the master boot block and have it do all the work.

Assuming that LILO has been written to the master boot record and is, therefore, the master boot record, it is loaded by the system BIOS into a specific memory location (0x7C00) and then executed. The primary boot loader then uses the system BIOS to load the secondary boot loader into a specific memory (0x9B000). The reason that the BIOS is still used at this point is that by including the code necessary to access the hardware, the secondary boot loader would be extremely large (at least by comparison to its current size.) Furthermore, it would need to be able to recognize and access different hardware types such as IDE and EIDE, as well as SCSI, and so forth.

This limits LILO, because it is obviously dependant on the BIOS. As a result, LILO and the secondary boot loader cannot access sectors on the hard disk that are above 1023. In fact, this is a problem for other PC-based operating systems, as well. There are two solutions to this problem. The original solution is simply to create the partitions so that the LILO and the secondary boot loader are at cylinder 1023 or below. This is one reason for the moving the boot files into the /boot directory which is often on a separate file system, that lies at the start of the hard disk.

The other solution is something called "Logical Block Addresses" (LBA). With LBA, the BIOS "thinks" there are less sectors than there actually are. Details on LBA can be found in the section on hard disks.

Contrary to common belief, it is actually the secondary boot loader that provides the prompt and accepts the various options. The secondary boot loader is what reads the /boot/map file to determine the location of kernel image to load.

You can configure LILO with a wide range of options. Not only can you boot with different operating systems, but with Linux you can boot different versions of the kernel as well as use different root file systems. This is useful if you are a developer because you can have multiple versions of the kernel on a single system. You can then boot them and test your product in different environments. We'll go into details about configuring LILO in the section on Installing your Linux kernel.

In addition, I always have three copies of my kernel on the system and have configured LILO to be able to boot any one of them. The first copy is the current kernel I am using. When I rebuild a new kernel and install it, it gets copied to /vmlinuz.old, which is the second kernel I can access. I then have a copy called /vmlinuz.orig, which is the original kernel from when I installed that particular release. This, at least, contains the drivers necessary to boot and access my hard disk and CD-ROM. If I can get that far, I can reinstall what I need to.

Typically on newer Linux versions, the kernel is no longer stored in the root directory, but rather in the /boot directory. Also, you will find that it is common that the version number of the respective kernel is added onto the end. For example, /boot/vmlinuz.2.4.18, which would indicate that this kernel is version 2.4.18. What is important is that the kernel can be located when the system boots and not what it is called.

During the course of this writing this material, I often had more than one distribution of Linux installed on my system. It was very useful to see whether the application software provided with one release was compatible with the kernel from a different distribution. Using various options to LILO, I could boot one kernel but use the root file system from a different version. This was also useful on at least one occasion when I had one version that didn't have the correct drivers in the kernel on the hard disk and I couldn't even boot it.

Once your system boots, you will see the kernel being loaded and started. As it is loaded and begins to execute, you will see screens of information flash past. For the uninitiated, this is overwhelming, but after you take a closer look at it, most of the information is very straightforward.

Once you're booted, you can see this information in the file /usr/adm/messages. Depending on your system, this file might be in /var/adm or even /var/log, although /var/log seems to be the most common, as of this writing. In the messages file, as well as during the boot process, you'll see several types of information that the system logging daemon (syslogd) is writing. The syslogd daemon usually continues logging as the system is running, although you can turn it off if you want. To look at the kernel messages messages after the system boots, you can use the dmesg command.

The general format for the entries is:

time hostname program: message

wheretime is the system time when the message is generated, hostname is the host that generated the message, program is the program that generated the message, and message is the text of the message. For example, a message from the kernel might look like this:

May 13 11:34:23 localhost kernel ide0: do_ide_reset: success

As the system is booting, all you see are the messages themselves and not the other information. Most of what you see as the system boots are messages from kernel, with a few other things, so you would see this message just as

ide0: do_ide_reset: success

Much of the information that the syslogd daemon writes comes from device drivers that perform any initialization routines. If you have hardware problems on your system, this is very useful information. One example I encountered was with two pieces of hardware that were both software-configurable. However, in both cases, the software wanted to configure them as the same IRQ. I could then change the source code and recompile so that one assigned a different IRQ.

You will also notice the kernel checking the existing hardware for specific capability, such as whether an FPU is present, whether the CPU has the hlt (halt) instruction, and so on.

What is logged and where it is logged is based on the /etc/syslog.conf file. Each entry is broken down into facility.priority, where facility is the part of the system such as the kernel or printer spooler and security and priority indicate the severity of the message. The facility.priority ranges from none, when no messages are logged, to emerg, which represents very significant events like kernel panics. Messages are generally logged to one file or another, though emergency messages should be displayed to everyone (usually done by default). See the syslog.conf man-page for details.

One last thing that the kernel does is start the init process, which reads the /etc/inittab file. It looks for any entry that should be run when the system is initializing (the entry has a sysinit in the third field) and then executes the corresponding command. (I'll get into details about different run-levels and these entries shortly.)

The first thing init runs out of the inittab is the script /etc/rc.d/rc.sysinit , which is similar to the bcheckrc script on other systems. As with everything else under /etc/rc.d, this is a shell script, so you can take a look at it if you want. Actually, I feel that looking through and becoming familiar with which scripts does what and it what order is a good way of learning about your system.

Among the myriad of things done here are checking and mounting file systems, removing old lock and PID files, and enabling the swap space.

Note that if the file system check notes some serious problems, the rc.sysinit will stop and bring you to a shell prompt, where you can attempt to clean up by hand. Once you exit this shell, the next command to be executed (aside from an echo) is a reboot. This is done to ensure the validity of the file systems.

Next, init looks through inittab for the line with initdefault in the third field. The initdefault entry tells the system what run-level to enter initially, normally run-level 3 (without X Windows) or run-level 5 (with X Windows). Other systems have the default run-level 1 to bring you into single-user or maintenance mode. Here you can perform certain actions without worrying users or too many other things happening on your system. (Note: You can keep users out simply by creating the file /etc/nologin. See the nologin man-page for details.)

What kind of actions can you perform here? The action with the most impact is adding new or updating software. Often, new software will affect old software in such a way that it is better not to have other users on the system. In such cases, the installation procedures for that software should keep you from installing unless you are in maintenance mode.

This is also a good place to configure hardware that you added or otherwise change the kernel. Although these actions rarely impact users, you will have to do a kernel rebuild. This takes up a lot of system resources and degrades overall performance. Plus, you need to reboot after doing a kernel rebuild and it takes longer to reboot from run-level 3 than from run-level 1.

If the changes you made do not require you to rebuild the kernel (say, adding new software), you can go directly from single-user to multi-user mode by running init 3. The argument to init is simply the run level you want to go into, which, for most purposes, is run-level 3. However, to shut down the system, you could bring the system to run-level 0 or 6. (See the init man-page for more details.)

Init looks for any entry that has a 3 in the second field. This 3 corresponds to the run-level where we currently are. Run-level 3 is the same as multi-user mode.

Within the inittab, there is a line for every run level that starts the script /etc/rc.d/rc, passing the run level as an argument. The /etc/rc.d/rc script, after a little housekeeping, then starts the scripts for that run level. For each run level, there is a directory underneath /etc/rc.d, such as rc3.d, which contains the scripts that will be run for that run level.

In these directories, you may find two sets of scripts. The scripts beginning with K are the kill scripts, which are used to shutdown/stop a particular subsystem. The S scripts are the start scripts. Note that the kill and start scripts are links to the files in /etc/rc.d/init.d. If there are K and S scripts with the same number, these are both linked to the same file.

This is done because the scripts are started with an argument of either start or stop. The script itself then changes its behavior based on whether you told it to start or stop. Naming them something (slightly) different allows us to start only the K scripts if we want to stop things and only the S scripts when we want to start things.

When the system changes to a particular run level, the first scripts that are started are the K scripts. This stops any of the processes that should not be running in that level. Next, the S scripts are run to start the processes that should be running.

Let's look at an example. On most systems, run-level 3 is almost the same as run-level 2. The only difference is that in run-level 2, NFS is not running. If you were to change from run-level 3 to run-level 2, NFS would go down. In run-level 1 (maintenance mode), almost everything is stopped.

No comments:

Post a Comment