Linux Software RAID and IDE Disks

I recently did some experiments and I decided I should write down the results, just in case someone else finds them useful. There seems to be quite a lot of questionable information floating around about the ATA (commonly known as an IDE) interface. It's often said that two drives on an ATA channel do not share bandwidth well. Based on some experimentation, that does not appear to be true. We have a server with 4 Maxtor 4D080H4 (80 GB, 5400 RPM) drives and an on board AMD 7441 controller. The four drives are connected to two channels. /dev/hda and /dev/hdb are on one channel while /dev/hdc and /dev/hdd are on the other. I used Andrew Morton's write-and-fsync tool to do some tests.

$ time write-and-fsync /dev/hda3 900 & \
    time write-and-fsync /dev/hdb3 900
  real    0m54.393s
  user    0m0.000s
  sys     0m7.290s

  real    0m55.955s
  user    0m0.010s
  sys     0m12.570s

  $ time write-and-fsync /dev/hda3 900 & \
    time write-and-fsync /dev/hdc3 900
  real    0m55.102s
  user    0m0.000s
  sys     0m9.110s

  real    0m55.330s
  user    0m0.010s
  sys     0m14.470s
  

That's about 16 MB/s sustained write speed to each drive in both cases. That seems to correspond to the maximum sustained write speed of the drive. It looks like there is no performance problem with having two drives on the same channel. It's possible that the bus bandwidth would become a problem with faster drives. Unfortunately I don't have any faster ATA drives to use for testing but since the bus is using UDMA/100 I'm guessing even faster drives would not saturate it.

While putting two drives on one channel isn't a problem for performance, it is a reliability problem. If one drive fails it will usually bring down the whole channel. I did some dangerous, cowboy-style tests and observed this problem. If you are using RAID5 and want your system to continue running after a drive failure then you cannot put two drives on one ATA channel.

One solution to this problem is to get another ATA controller and put each drive on its own channel. Another solution I discovered is to use RAID0+1 (also know as RAID10) instead of RAID5. Make two reliable RAID devices by mirroring two sets of two drives. Then, improve performance by striping across the two RAID devices to create a RAID0+1 device. For example, assume you have the disks /dev/hda, /dev/hdb, /dev/hdc, and /dev/hdd. hda and hdb are on the same channel as are hdc and hdd. Create a RAID1 device using hda and hdc. Create a second RAID1 device using hdb and hdd. Create a RAID0 device using the the two RAID1 devices. With this configuration, if one channel goes down your system should continue running. I tested this by pulling out the power connector for one of the drives while the server was using the RAID0+1 device. As predicted, the associated ATA channel went down but after a few seconds of stalling the server continued running in degraded mode. After shutting down, reconnecting the power and re-starting, I was able to hot add the two drives on the failed channel back to the RAID array.

Another advantage of using RAID0+1 over RAID5 is that it performs better. Writing to a RAID5 device is slow. I found that writing to a RAID0+1 device is about 2 times faster than RAID5. I repeated the write-and-fsync test on a RAID0+1 device.

$ time ./write-and-fsync /dev/md6 900
  real    0m28.010s
  user    0m0.010s
  sys     0m6.110s
  

That works out to 32 MB/s. Running bonnie++ on Reiserfs filesystem on a RAID0+1 device also produces nice numbers: 52 MB/s write, 26 MB/s rewrite, and 62 MB/s read. Remember, that's with 4 cheap 5400 RPM drives on two ATA channels.

For kicks, I compared NFSv3 write performance to a Linux RAID0+1 device with a high-end hardware RAID5 SCSI device connected to a Sun Ultra Enterprise server. The hardware RAID device uses ultra wide 7200 RPM SCSI Quantum drives and connects to the server using ultra whide SCSI 160. To be fair, I configured the Linux NFS server to use synchronous writes. My test was to extract the Linux kernel from a local tar file onto the NFS device. Writing to the Sun server took about 170 seconds while writing to the Linux server took about 150 seconds. That's pretty respectable for cheap hardware.

Put your root partition on a RAID1 device and everything else on RAID0+1 devices and you have cheap, fast, redundant storage. Cool.

Comments?

Updated: 2003-03-03