Monday, May 18, 2009

Devices and Device Nodes


In UNIX, nothing works without devices. I mean nothing. Getting input from a keyboard or displaying it on your screen both require devices. Accessing data from the hard disk or printing a report also require devices. In an operating system like DOS, all of the input and output functions are almost entirely hidden from you by the operating system, which uses special software called "device drivers" or simply "drivers". Drivers for these devices must exist for you to be able to use them, though they are hidden behind the cloak of the operating system.

Although they access the same physical hardware, device drivers under UNIX are more complex than their DOS cousins. Although adding new drivers is easier under DOS, Linux provides more flexibility in modifying those you already have. Linux provides a mechanism to simplify adding these input and output functions. Linux also provides a large set of tools and utilities to modify and configure how your system and these device drivers interact.

Many new Linux users have trouble with device nodes for a number of different reasons. For the uninitiated, it is often difficult to figure out exactly what device node is needed for a particular task. Part of this is because the device node names aren't exactly intuitive, and part of it is because it's not often obvious which device node is the one you actually need.

One of the first problems encountered by new Linux users is with hard disks. Users almost always come from a Windows background, and they are used to accessing hard disks, CD-ROMs, and floppies by using drive letters (like Window's C:\ or D:\ drives, for example). For the most part, Windows users do not even care where the various partitions are; they just know which drive letter to use to access a particular file or directory. In most cases, that's all they really need to know.

With Linux (or any Unix variant), however, the situation is very different. Although new installation procedures and administration tools have made things a lot easier, there still comes a time when you need to know that the device node /dev/hda1 relates to your hard disk and /dev/tty01 is a console terminal. For most day-to-day activity you can get by with simply knowing the names of the devices and what they are used for. But even learning that can be a daunting task. There are just so many unintuitive names to deal with. Still, with a little time and practice, the function of these devices should become clear, and soon you'll be using them like an old pro.

What's in a Name?

So what exactly is a device node? It's basically a file. Like all other dialects of Unix, Linux accesses hardware devices just as if it were reading or writing any other file. This makes writing programs for Linux easier because the system can use many of the same functions to access both hardware devices and "normal" files.

Device nodes (often referred to simply as "devices" in casual conversation) are the files that the kernel, applications, and even command-line tools use when they need to access the hardware. You can think of the device node (or file) as providing an interface similar to a telephone jack. The phone jack provides a convenient and standardized way of attaching things to the phone line, but the jack is not the phone line itself. It doesn't matter if you're plugging a telephone, a modem, or a fax machine into the jack, because all of these use the same interface. Similarly, your printer doesn't care if it's being accessed by the kernel, by a word processor, or by a graphics program, because they all do so through the same interface.

The down side to all of this is that device nodes and the concept of accessing hardware through them can be confusing to users who are unfamiliar with these ideas. There are, however, parallels in the DOS and Windows world. Using names such as A:, COM1:, and PRN: to access hardware in DOS is not all that different than using device nodes to access hardware under Linux (at least from the user's point of view).

In order to access the hardware in this fashion, the operating system has to refer to each piece of hardware by a unique name. In Linux, for example, /dev/fd0 is the name for the floppy drive, similar to the A: that DOS uses. In DOS, the name assigned to the printer is PRN:, while in Linux it's /dev/lpt0. In order for you to access these devices, you simply have to know their names.

Since device nodes are just files on the hard disk, they are treated like files.On most systems, everyone can at least look at them, and the system administrator (root) can access the device nodes directly, just like any other file.

As with other files on your computer, device nodes are assigned specific permissions that allow some people to read from and write to them, but limit other people's access. These permissions are the safety mechanism that prevents unfortunate accidents such as random disk overwrites from happening. If you do have access to read from and write to the various device nodes, you could actually over-write the hard disk. This, among other reasons, is why you really do have to be very careful about what you do when you're logged in to your system as root.

Odds and Ends

There are a couple of oddities about Linux device nodes that need to be addressed. The first actually applies to all dialects of Unix and is related to the difference between a block device and a character device. The general misconception is that character devices are only read one character at a time. This is not the case. Character devices differ from block devices in that they are typically read sequentially rather than randomly. Hard drives are block devices because they can be accessed randomly, and terminals are character devices because they are accessed sequentially.

However, this is only a convention and not hard and fast rule. In many cases, you read block devices one characters at a time.

Under Linux (as well as other Unix dialects), access to block devices goes through a system cache called the buffer cache. One key advantage of the buffer cache is that the system can keep track of recently accessed blocks. If a process needs to read something that is still in the buffer cache (and has not been changed), there is no need to re-read the device. Instead, the system simply passes the block from the buffer to the process.

When writing back to a block device, the process is similar. The process thinks it is writing to the device, but is actually writing to the buffer cache. This block is marked as "dirty" and will be written to the disk when the system gets around to it. If a process needs to read the block, then there is again no need to access the device directly.

Note that there is a delay in writing the information to the disk. If something happens to the computer before the data stored in the buffer is written (for example, a power outage), there is a possibility that the data could be lost. The delay is fairly short (default 30 seconds for data buffers and 5 seconds for metadata buffers), however, so it's unlikely that too much will be lost. In addition, it is possible to use the O_SYNC flag when opening the device, which forces the buffered data to be written.

Another oddity that you will find on Linux systems is that a large portion of the major numbers are repeated. That is, there are often two completely unrelated devices that have the same major number. For example, hard disks and pseudo-ttys (when using telnet) both have a major number of 3. Some Unix dialects, such as SCO, use the same major number for the block and character versions of the same device. Despite this, the device drivers are still capable of determining which driver is needed because there are other methods used to differentiate between them.

A Rose By any Other Name

It is possible to have two device nodes that point at the same device. These nodes can have different names, but if they are of the same device type and they have the same major-minor number pair, they are actually pointing at the same device.

So, why would anyone want to have two device nodes pointing at the same device? The biggest reason for this is convenience. It is extremely useful to name a device in such a way that we mere mortals can recognize it. There are several common devices on Linux systems that have more than one name, one being the swap device.

On my system, the swap partition is the fourth primary partition on the first SCSI hard disk. Under the device node naming scheme we discussed earlier, it is called /dev/sda4. Remembering that the swap partition is /dev/ sda4, however, isn't all that easy. For this reason, the swap partition is also usually called /dev/swap. This is much more recognizable than the name given it under the standard naming scheme. While /dev/sda4 tells me where the swap partition is, /dev/swap tells me what it is.

Another common device that uses this trick is /dev/tape. In my case, it is the same as /dev/st0, which is my first SCSI tape drive. However, if I access /dev/tape, it really does not matter if my tape drive is SCSI or not, as the system does the work for me.

One thing to note is that you cannot simply copy device nodes using cp. In addition, you should not just create new device nodes for this purpose using the mknod command. Although this would get you two identical device nodes, when you change one, the other is unaffected. For this reason, you should create links between the device nodes rather than making duplicates of them.

One thing I use this linking mechanism for is my FAT partitions. Since I need filesystems that are available from Linux, Windows NT, and a couple of other operating systems, I have several FAT partitions. In order to make things simpler for me, I do one of two things. Either I create links using the DOS/Windows drive letter or I create links with the name by which the drive is shared.

For example, my data is stored on what appears as the G:\ drive under DOS/Windows, and which resides on the Linux partition /dev/sdb6.I might have a device node /dev/dos_g, that is linked to /dev/sdb6. The /dev/dos_g name tells me that this partition appears under DOS as drive G:\. Since the drive is also shared with Samba, I might create a link /dev/Data, which is the share name. These (along with other FAT partitions) are then mounted automatically through the /etc/fstab file when the system boots. Remembering that /dev/dos_g is the same as the G:\ drive in DOS is much simpler than trying to remember /dev/sdb6.

Whether you create hard links or symbolic links is almost a matter of personal preference. Typically, however, symbolic links are used. If you look in the /dev directory, you will see a number of device which are already symbolic links. Therefore, I think it is better to stick with what is already on your system.

Finding Out More

Many of the devices on your system have associated man pages. Figuring out which man page you need, however, isn't always straightforward. If you are unsure what a particular device is used for, you can usually figure out the meaning of the base name. For example, hd for IDE hard disks, sd for SCSI hard disks, fd for floppy drives, and so forth. Often there is a general man page for that type of device, so man sd will call up the page for SCSI type hard drives. Alternatively, you can use the -k option on man to search for a particular keyword. For example, man -k disk will show you all of the man pages that contain the word "disk."

Man pages are useful, when they exist. Unfortunately, not all devices have an associated man page. If this is the case, you can usually find some information in the documentation subdirectory in the kernel source tree (typically /usr/src/linux). There, you will find a file called devices.txt, which is a reasonably comprehensive list of the major numbers. Often this file will also list the minor numbers for each device, or at least give an explanation of the related minor numbering scheme.

No comments:

Post a Comment