Tonight I had to replace a disk in my FreeNAS box that was completely dead, as in, not detected by the BIOS. Below are the steps to replace a completely failed disk. The FreeNAS docs have an article on replacing a failed disk but it does not cover replacing a disk that is no longer detected by the system. You can read that article here.
A zpool status shows the disk as unavailable:
[root@freenas] ~# zpool status -v zpool0
pool: zpool0
state: DEGRADED
status: One or more devices could not be opened. Sufficient replicas exist for
the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
see: http://illumos.org/msg/ZFS-8000-2Q
scan: scrub repaired 0 in 6h31m with 0 errors on Sun Jul 28 06:31:27 2013
config:
NAME STATE READ WRITE CKSUM
zpool0 DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
3282272283788900661 UNAVAIL 0 0 0 was /dev/gptid/3937b1c2-fec4-11d5-a8b2-001f2961db
gptid/398a9808-fec4-11d5-a8b2-001f2961db70 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
gptid/998b8dc4-ff2b-11d5-a8b2-001f2961db70 ONLINE 0 0 0
gptid/99e507d9-ff2b-11d5-a8b2-001f2961db70 ONLINE 0 0 0
errors: No known data errors
Next I took a screenshot of all my disks via the webGUI and noted the serial numbers for the disks in the pool. The failed disk will not show up in the list so we can use that to identify which physical disk we need to pull. Next shutdown the server and start pulling one disk at a time until you find the one with the serial number that is not in your list of serial numbers. When you find it, pull it out and replace it with your new one noting the serial number of the new disk. Next power on the system and login via SSH.
Next, offline the failed disk:
[root@freenas] ~# zpool offline zpool0 /dev/gptid/3937b1c2-fec4-11d5-a8b2-001f2961db70
Check the status of the disk to ensure it's offline:
[root@freenas] ~# zpool status -v zpool0
pool: zpool0
state: DEGRADED
status: One or more devices has been taken offline by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using 'zpool online' or replace the device with
'zpool replace'.
scan: scrub repaired 0 in 6h31m with 0 errors on Sun Jul 28 06:31:27 2013
config:
NAME STATE READ WRITE CKSUM
zpool0 DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
3282272283788900661 OFFLINE 0 0 0 was /dev/gptid/3937b1c2-fec4-11d5-a8b2-001f2961db
gptid/398a9808-fec4-11d5-a8b2-001f2961db70 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
gptid/998b8dc4-ff2b-11d5-a8b2-001f2961db70 ONLINE 0 0 0
gptid/99e507d9-ff2b-11d5-a8b2-001f2961db70 ONLINE 0 0 0
errors: No known data errors
Now replace the disk in the pool with your new disk. You can use the webGUI to get the block device name, looking for the serial number of the new device you noted above:
[root@freenas] ~# zpool replace zpool0 /dev/gptid/3937b1c2-fec4-11d5-a8b2-001f2961db70 /dev/ada2
Now just online the disk and ensure its says the new disk is resilvering:
[root@freenas] ~# zpool online zpool0 /dev/ada2
[root@freenas] ~# zpool status -v zpool0
pool: zpool0
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Tue Sep 3 21:32:02 2013
28.1M scanned out of 3.53T at 1.17M/s, (scan is slow, no estimated time)
15.4M resilvered, 0.00% done
config:
NAME STATE READ WRITE CKSUM
zpool0 DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
replacing-0 DEGRADED 0 0 0
3282272283788900661 OFFLINE 0 0 0 was /dev/gptid/3937b1c2-fec4-11d5-a8b2-001f2961db70
ada2 ONLINE 0 0 0 (resilvering)
gptid/398a9808-fec4-11d5-a8b2-001f2961db70 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
gptid/998b8dc4-ff2b-11d5-a8b2-001f2961db70 ONLINE 0 0 0
gptid/99e507d9-ff2b-11d5-a8b2-001f2961db70 ONLINE 0 0 0
errors: No known data errors
Thanks! Saved my bacon :-)
ReplyDeleteWhy aren't you using the GUI for this? When a disk is no longer detected, it'll show up as missing and can be replaced directly.
ReplyDeleteYour manual guide misses some crucial parts, e.g. you are not creating a swap partition, you are not using GPTIDs (which will result the replacing disk to show up as /dev/ada2), and since you didn't use partitions you also aren't doing any 4k sector alignment, which could lead to performance penalties.
As you can see the FreeNAS system does quite a bit of work behind the scenes, and if you don't have a deep knowledge of these internal processes you shouldn't mess with the CLI.
according to your zpool status, the drive that was UNAVAILABLE was
ReplyDelete/dev/gptid/3937b1c2-fec4-11d5-a8b2-001f2961db
Why did you zpool replace:
/dev/gptid/3937b1c2-fec4-11d5-a8b2-001f2961db70
i.e. why did you add the "70" to the information given by zpool status, or was one or the other a typo?
Also, from mysticx, how do you use the GUI for this?
You absolutely saved my life, I am glad there are people like you explaining the things that go wrong beyond the freenas support. A++ well done...
ReplyDelete