Articles

Tuesday, September 3, 2013

FreeNAS - Replacing a Failed Disk

Tonight I had to replace a disk in my FreeNAS box that was completely dead, as in, not detected by the BIOS. Below are the steps to replace a completely failed disk. The FreeNAS docs have an article on replacing a failed disk but it does not cover replacing a disk that is no longer detected by the system. You can read that article here.

A zpool status shows the disk as unavailable:

[root@freenas] ~# zpool status -v zpool0
  pool: zpool0
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-2Q
  scan: scrub repaired 0 in 6h31m with 0 errors on Sun Jul 28 06:31:27 2013
config:

        NAME                                            STATE     READ WRITE CKSUM
        zpool0                                          DEGRADED     0     0     0
          mirror-0                                      DEGRADED     0     0     0
            3282272283788900661                         UNAVAIL      0     0     0  was /dev/gptid/3937b1c2-fec4-11d5-a8b2-001f2961db
            gptid/398a9808-fec4-11d5-a8b2-001f2961db70  ONLINE       0     0     0
          mirror-1                                      ONLINE       0     0     0
            gptid/998b8dc4-ff2b-11d5-a8b2-001f2961db70  ONLINE       0     0     0
            gptid/99e507d9-ff2b-11d5-a8b2-001f2961db70  ONLINE       0     0     0

errors: No known data errors

Next I took a screenshot of all my disks via the webGUI and noted the serial numbers for the disks in the pool. The failed disk will not show up in the list so we can use that to identify which physical disk we need to pull. Next shutdown the server and start pulling one disk at a time until you find the one with the serial number that is not in your list of serial numbers. When you find it, pull it out and replace it with your new one noting the serial number of the new disk. Next power on the system and login via SSH.

Next, offline the failed disk:

[root@freenas] ~# zpool offline zpool0 /dev/gptid/3937b1c2-fec4-11d5-a8b2-001f2961db70

Check the status of the disk to ensure it's offline:

[root@freenas] ~# zpool status -v zpool0
  pool: zpool0
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using 'zpool online' or replace the device with
        'zpool replace'.
  scan: scrub repaired 0 in 6h31m with 0 errors on Sun Jul 28 06:31:27 2013
config:

        NAME                                            STATE     READ WRITE CKSUM
        zpool0                                          DEGRADED     0     0     0
          mirror-0                                      DEGRADED     0     0     0
            3282272283788900661                         OFFLINE      0     0     0  was /dev/gptid/3937b1c2-fec4-11d5-a8b2-001f2961db
            gptid/398a9808-fec4-11d5-a8b2-001f2961db70  ONLINE       0     0     0
          mirror-1                                      ONLINE       0     0     0
            gptid/998b8dc4-ff2b-11d5-a8b2-001f2961db70  ONLINE       0     0     0
            gptid/99e507d9-ff2b-11d5-a8b2-001f2961db70  ONLINE       0     0     0


errors: No known data errors

Now replace the disk in the pool with your new disk. You can use the webGUI to get the block device name, looking for the serial number of the new device you noted above:

[root@freenas] ~# zpool replace zpool0 /dev/gptid/3937b1c2-fec4-11d5-a8b2-001f2961db70 /dev/ada2

Now just online the disk and ensure its says the new disk is resilvering:

[root@freenas] ~# zpool online zpool0 /dev/ada2
[root@freenas] ~# zpool status -v zpool0
  pool: zpool0
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Tue Sep  3 21:32:02 2013
        28.1M scanned out of 3.53T at 1.17M/s, (scan is slow, no estimated time)
        15.4M resilvered, 0.00% done
config:

        NAME                                            STATE     READ WRITE CKSUM
        zpool0                                          DEGRADED     0     0     0
          mirror-0                                      DEGRADED     0     0     0
            replacing-0                                 DEGRADED     0     0     0
              3282272283788900661                       OFFLINE      0     0     0  was /dev/gptid/3937b1c2-fec4-11d5-a8b2-001f2961db70
              ada2                                      ONLINE       0     0     0  (resilvering)
            gptid/398a9808-fec4-11d5-a8b2-001f2961db70  ONLINE       0     0     0
          mirror-1                                      ONLINE       0     0     0
            gptid/998b8dc4-ff2b-11d5-a8b2-001f2961db70  ONLINE       0     0     0
            gptid/99e507d9-ff2b-11d5-a8b2-001f2961db70  ONLINE       0     0     0


errors: No known data errors

4 comments:

  1. Why aren't you using the GUI for this? When a disk is no longer detected, it'll show up as missing and can be replaced directly.

    Your manual guide misses some crucial parts, e.g. you are not creating a swap partition, you are not using GPTIDs (which will result the replacing disk to show up as /dev/ada2), and since you didn't use partitions you also aren't doing any 4k sector alignment, which could lead to performance penalties.

    As you can see the FreeNAS system does quite a bit of work behind the scenes, and if you don't have a deep knowledge of these internal processes you shouldn't mess with the CLI.

    ReplyDelete
  2. according to your zpool status, the drive that was UNAVAILABLE was
    /dev/gptid/3937b1c2-fec4-11d5-a8b2-001f2961db

    Why did you zpool replace:
    /dev/gptid/3937b1c2-fec4-11d5-a8b2-001f2961db70

    i.e. why did you add the "70" to the information given by zpool status, or was one or the other a typo?

    Also, from mysticx, how do you use the GUI for this?

    ReplyDelete
  3. You absolutely saved my life, I am glad there are people like you explaining the things that go wrong beyond the freenas support. A++ well done...

    ReplyDelete