Pages

Wednesday, October 20, 2010

RAID Manipulation

Playing with LVM2 is nice because you can manipulate and assign the available space you have any way you want. The actual unseen hero is the RAID5 volume that we have created because it provides a guarantee that even if one disk fails we will not lose our data. Let's play with the RAID disks now.
Fail a RAID disk and recover

Hopefully you will never have to face a failed disk, but it is quite important to know what to do if a disk fails. For now, in our comfortable sandbox we will imitate a failed disk and we will check if we'll lose any data.



Notice that the state is clean, degraded meaning that the files are OK but the state of the array is, well, degraded.

Do we still have our files?



Yep, still here. Now, let's remove the failed disk (as if you are opening the computer case and removing the failed disk)



Ok the disk is removed. Now let's assume that you replace the failed hard drive, that you have recreated a similar partition as the one originally used and you are ready to fix the degraded RAID array.



Now, immediately after adding the disk the array will start reconstructing itself. It should look something like:



Notice how the state is clean, degraded, recovering and that the Rebuild Status process is at 32% percent. Depending on the size of your array the process might take from some minutes (I doubt it) up to some hours or days. You can always check the process with the above command. The array should be still usable and accesible even while it is getting reconstructed, but I really wouldn't like to push my luck, so I wouldn't suggest doing any work in the array until the reconstruction is done.

In our case the reconstruction, for a tiny 120MB array, takes merely seconds and if we check the state again we see:



Nice and clean. Let's hope you never have to go through a procedure like this, EVER.

Increase the RAID array size

Up until now we were playing with LVM2 volumes and the allocated space that was assigned to the LVM2 Physical Volume. But, what should we do if we want to create a Logical Volume larger than the 49 PE that the Volume Group contains? Well, not much unless we increase the number of PE.

This can be done in two ways:

  • Increase the number of disks in the RAID5 array or
  • Increase the size of the disks already in the array.


Add new disks to the RAID array

Adding a new disk in the array is easy (assuming you still have free SATA ports in your motherboard).

Let's run an example:



So we added the newly created disk to the array and mdadm considers the new disk a "spare" disk. A spare disk is considered a disk that is in standby mode ready to substitute a failed disk (automatically, I think; needs investigation). The spare disks capacity does not participate in the total capacity of the array, as it is in standby mode. Let's make it a part of the array:



Immediately after the new disk is added to the array, mdadm starts reconstructing the array. Here we see a 19% complete status and the upgrade notification of 3->4 disks. Again, depending on your disks sizes this might take some time, and probably longer than adding a failed disk to the original disk array size because all the data will have to be re-written and spread across the new disk.

The array should still be functional and up and running during the whole procedure. Once it's done we have to allocate the extra space to the LVM2 Physical Volume and resize the lvm-group LVM2 Volume Group.

First let's see how much space the Physical Volume has:



So, although we have added a fourth disk in the RAID array the LVM2 Physical Volume is still at 200MB.

Let's increase it:



As you see the Physical Volume now is 300MB. A nice side-effect is that the lvm-group Volume Group was also automatically resized, because when we initially created we used all the available space of the Physical Volume, so it kept using all the available PE. So now we have a total of 74 PE to play with. If you want to increase the size of your Logical Volumes you can now easily do it (see the 'Resize Volumes' above).

But do we still have our data?



Yep, still here.

Increase the size of the RAID disks

In reality (as in my personal case) you will end up increasing the initial RAID size because you want to substitute the original disks with larger ones. The prices keep dropping and the amount of SATA ports in a motherboard are finite. So, sooner or later you will need to replace the disks, with larger ones.

Unfortunately, you cannot do the upgrade at your leisure. That is, you just can't drop a 1TB disk together with a couple of 640GB disks and expect to create a RAID5 array larger than the original (3-1) x 640GB = 1280GB RAID5 array. You will have to upgrade all the disks at the same time, if you want to expand your RAID5 array. You can always, upgrade just one disk, though, and use the extra space of the larger disk as another partition.

So, for now, we have a 4 disk array with 100MB capacity each. Let's substitute each disk with a 150MB one (one at a time).



What we did was to fail, remove and substitute one disk with another, bigger one. As always the procedure is not instantaneous and the "adding" of the larger disk into the array should take some time.

When the array has finished restructuring, from the last add we did, lather-rinse-repeat for the remaining disks (loop2, loop3 and loop4 in our case). When you are done with all disks:



The array is still 300MB (4 - 1) * 100MB = 300 MB! What happens is that during the introduction of the larger disks into the already existing array, mdadm is using a partition as large as the existing partitions that already are part of the array. So basically although we are adding a 150MB disk, the array is using only 100MB, just as the already existing older 100MB disks.

When we substitute all four disks, we can resize the array and use the remaining 50MB from each disk. Let's do it:



Finally done. The array is now 450MB (4-1) * 150MB = 450 MB. But how about the Volume Groups we had?



As you see we went from 74 PE to 112 PE. Now we can go back to normal LVM2 volume manipulation to take advantage of the gained space.

Recover RAID array

OK, but what happens when something happens and you lose your RAID array?

Let's just say that you whole motherboard just fails and you have to take your disks and set the machine again. Or, when you re-install the OS and you want to remount your RAID array.

The actual neccessary file, to identify and mount the RAID array lives in /etc/mdadm/mdadm.conf and it looks like this:



Note the ARRAY definitions that tell mdadm what arrays you are currently using.

On a newly created system this file will be probably missing and you will have to recreate it.

If you already have an /etc/mdadm/mdadm.conf file in your system, but you are missing the ARRAY definitions you can force mdadm to create these definitions for you:



All you have to do is append these lines at the end of the /etc/mdadm/mdadm.conf file.

If you don't have an mdadm.conf file at all you can quickly produce one:



Now, if you reboot you should have you array back. Well, done!