Điểm:1

Clear ZFS Checksum errors?

lá cờ in

TLDR; My ZFS mirror pool got some checksum errors. I replaced the controller, thinking that was the most likely cause, but the errors won't clear. pool clear temporarily resets them, but they come back the next time I run a scrub. How can I clear them for good?

Full story: I have had a ZFS mirror-0 set up and running on ubuntu 20.04.2 LTS for some time. When one of the drives died, I took advantage of the failure to replace both drives with larger ones, as well as adding a SATA-III PCI card for the new drives (the old ones had been connected to the on-board SATA II controller, as I had no more SATA III ports available). After running on the new drives and controller for a few weeks, ZFS complained about checksum errors on both new drives, and put the array into a "degraded" state as a result.

Some research led me to the conclusion that since both drives were showing the exact same number of checksum errors, it was much more likely to be an issue with the controller than with the drives themselves. So I pulled the new controller and put the drives back on the onboard SATA II controller for now, intending to replace the controller card once I verify that is the issue. I then deleted the two files that zpool status -v showed as having permanent errors, issued a zpool clear data to reset the errors, and ran a scrub.

Unfortunately, after the scrub the errors re-appeared, only now a -v no longer showed a file, but just the address (inode, I believe), presumably for one of the files I had deleted earlier. I tried again, with the same result. Every time I run a scrub, it comes back with the following result:

root@watchman:~# zpool status -v
  pool: data
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
   see: http://zfsonlinux.org/msg/ZFS-8000-8A
  scan: scrub repaired 16K in 0 days 09:10:20 with 1 errors on Sat Jul 24 15:48:21 2021
config:

    NAME                                 STATE     READ WRITE CKSUM
    data                                 DEGRADED     0     0     0
      mirror-0                           DEGRADED     0     0     0
        ata-ST8000VE000-2P6101_WSD1M5NW  DEGRADED     0     0    15  too many errors
        ata-ST8000VE000-2P6101_WSD1HEJX  DEGRADED     0     0    15  too many errors

errors: Permanent errors have been detected in the following files:

        data:<0x380508>

From what I can tell, this is just the same issue that already existed due, presumably, to the bad controller, but I can't seem to clear it out. How can I restore my mirror to a fully-functioning state?

UPDATE: I finally gave up on the idea of clearing the errors, and instead started over. I created a new pool, stealing one of the drives from the existing mirror. I then ran a rsync to copy all the data over from the old pool to the new. This did run into a few errors (zfs wasn't lying about data errors), but nothing significant or troubling, and excluding the errored files allowed rsync to complete successfully. I then added the second drive to the new pool, and after a resilver everything now looks good, and a scrub on the new pool completed without error.

So assuming everything continues to look good for the next week or so, I think it's safe to assume the SATA III card was the cause of the issue, and replace it with a better brand/option :)

djdomi avatar
lá cờ za
Tôi tin rằng đã đến lúc sao lưu và kiểm tra phần cứng bị lỗi
ibrewster avatar
lá cờ in
@djdomi Vâng, tôi tin rằng bộ điều khiển bị lỗi. Tôi đã kéo nó, nhưng không thể xóa các lỗi hiện tại, hơi khó xác nhận nếu đó thực sự là trường hợp.

Đăng câu trả lời

Hầu hết mọi người không hiểu rằng việc đặt nhiều câu hỏi sẽ mở ra cơ hội học hỏi và cải thiện mối quan hệ giữa các cá nhân. Ví dụ, trong các nghiên cứu của Alison, mặc dù mọi người có thể nhớ chính xác có bao nhiêu câu hỏi đã được đặt ra trong các cuộc trò chuyện của họ, nhưng họ không trực giác nhận ra mối liên hệ giữa câu hỏi và sự yêu thích. Qua bốn nghiên cứu, trong đó những người tham gia tự tham gia vào các cuộc trò chuyện hoặc đọc bản ghi lại các cuộc trò chuyện của người khác, mọi người có xu hướng không nhận ra rằng việc đặt câu hỏi sẽ ảnh hưởng—hoặc đã ảnh hưởng—mức độ thân thiện giữa những người đối thoại.