I was not sure exactly where to post this I guess this is the best spot... I currently have a 4TB Raid 5 box, that I use for file storage for my home. I am using a 3Ware 9550-SXU 4LP raid card and currently have 4- 1TB Seagate Barracuda 7200rmp (st31000340as) drives in a Raid 5 array. I use the 3dm2 software to monitor my array through my windows xp box, the raid box runs Freenas 0.69b2 (revision 3631) / FreeBSD 6.3-STABLE (revision 199506), the server has been running since about August of 2008 with no problems until now... I was just transferring a file from my server to my desktop when I received an email from 3dm2 stating there had been an ECC error on a drive located at subunit 1 port 2, I logged into 3dm2 to see if there was any further information. When I looked at the array i notice 2 of the 4 drives has a status of OK, the drive on port 2 did in fact have a warning and a status of ECC error, and in addition I noticed that the drive on subunit 3 port 0 had a status of Degraded and that the array was currently rebuilding with about 25% completion. I checked the alarms tab and this is what’s listed, (disregard the times and date as they are not set correctly) Oct 20, 2009 12:25.58AM (0x04:0x003B): Rebuild paused: unit=0 Oct 20, 2009 12:25.58AM (0x04:0x0004): Rebuild failed: unit=0 Oct 20, 2009 12:25.58AM (0x04:0x002D): Source drive error occurred: port=2, unit=0 Oct 20, 2009 12:25.58AM (0x04:0x0026): Drive ECC error reported: port=2, unit=0 Oct 20, 2009 12:17.16AM (0x04:0x0202): Drive ECC error: port=2 Oct 20, 2009 12:17.04AM (0x04:0x0202): Drive ECC error: port=2 Oct 20, 2009 12:16.51AM (0x04:0x0202): Drive ECC error: port=2 Oct 20, 2009 12:16.38AM (0x04:0x0202): Drive ECC error: port=2 Oct 20, 2009 12:16.25AM (0x04:0x0202): Drive ECC error: port=2 Oct 20, 2009 12:16.12AM (0x04:0x0202): Drive ECC error: port=2 Oct 20, 2009 12:15.59AM (0x04:0x0202): Drive ECC error: port=2 Oct 20, 2009 12:15.47AM (0x04:0x0202): Drive ECC error: port=2 Oct 20, 2009 12:01.21AM (0x04:0x000B): Rebuild started: unit=0 I currently still have access to all of the files stored on the server and have all of the important files backed up to an offsite location but i still have a lot of files that are not backed up and would prefer not to lose. My question is what should I do next. I have a backup 1TB drive ready to insert into the array but which drive do i swap out, the degraded drive or the drive with the ECC error, or neither, i am currently transferring the files that are not backed up to another box but i would like to save this array due to the sheer size of it and the time it is going to take to transfer all the files from scratch. Any help would be greatly appreciated, and if any more information is needed do not hesitate to ask. Thanks for your time and help...
Remove the ECC error drive from the array and have it replaced. When the new drive is fitted and the array is rebuilt, remove the degraded drive and have it SMART checked on its own.
I want to thank you for your response but I do have a question, since the raid rebuild has not been completed if i pull the drive with the ecc error and replace it, therefore leaving 2 drives that are ok and one that is degrated and one that is completely blank am i going to lose the files on the array? I guess the question is what exactly does a degraded drive mean? thank you in advance again...
RAID5 only protects against one drive loss. If you take two out, the array will break and all data will be lost unless you have the array rebuilt with at least three of the drives intact.
Remove the drive with the ECC error, and put the new drive in it's place. Rebuild from there. Sammoris's advise is for if you had a free port to plug in a 5th drive.
He has 4 ports and 4 drives. Any one drive can fail and/or be removed without data loss. When one fails, you replace that drive...you don't replace one of the three that are holding all the data, if you do that then you loose everything!
Yeah, there must have been some confusion, that's what I meant and what I thought I said :S After removing the faulty drive, I did try and explain that you'd need to put its replacement back before removing the other, perhaps I wsn't clear, and if not I apologise