zfs
zfs copied to clipboard
CHKSUM-errors on a just attached disk
System information
Type | Version/Name |
---|---|
Distribution Name | Ubuntu Server |
Distribution Version | 18.04.2 LTS |
Linux Kernel | 4.15.0-48 |
Architecture | x86_64 |
ZFS Version | 0.7.5-1ubuntu16.4 |
SPL Version | 0.7.5-1ubuntu1 |
Describe the problem you're observing
The issue initially was found on a Nexenta's system, but can be successfully reproduced on linux. We thought the issue is a regression after integration of Sequential-resilvering, but it can be reproduced with disabled seq-resilvering, but not so quickly. So looks like this an old general issue.
There is a code, that handles non-finished resilvering, but it seems it not fully covers it. So that scrub
can issue READ
zios to just attached disk, that causes the CHKSUM errors.
Describe how to reproduce the problem
Create 2 RAM disks (the size of each disk is 200MB)
modprobe brd rd_nr=2 rd_size=204800
Create a mirrored pool on the disks
zpool create -o cachefile=none rtest mirror /dev/ram0 /dev/ram1
Run the following 3 scripts in parallel.
while true; do
dd if=/dev/urandom of=/rtest/f bs=1M count=90
done
while true; do
zpool scrub rtest
done
while true; do
zpool detach rtest ram1
zpool attach rtest ram0 /dev/ram1
sleep 10
done
And monitor the status of the pool
zpool status rtest 1
Include any warning/errors/backtraces from the system logs
pool: rtest
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-9P
scan: scrub in progress since Mon Apr 29 20:39:50 2019
87.5M scanned out of 88.0M at 87.5M/s, 0h0m to go
0B repaired, 99.43% done
config:
NAME STATE READ WRITE CKSUM
rtest ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ram0 ONLINE 0 0 0
ram1 ONLINE 0 0 10
errors: No known data errors
Hi Roman, I might have another reproducer for a similar issue. Can you add the ereport?
erepot? What is it?
FMA ereport. On illumos fmdump -eV
or ZoL zpool events -v
# zpool status rtest 1
pool: rtest
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-9P
scan: scrub repaired 0B in 0h0m with 0 errors on Tue Apr 30 19:29:46 2019
config:
NAME STATE READ WRITE CKSUM
rtest ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ram0 ONLINE 0 0 0
ram1 ONLINE 0 0 47
errors: No known data errors
There are no errors in the report:
click me
TIME CLASS
Apr 30 2019 19:29:42.878638170 sysevent.fs.zfs.scrub_start
version = 0x0
class = "sysevent.fs.zfs.scrub_start"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
time = 0x5cc8a226 0x345ef45a
eid = 0x205
Apr 30 2019 19:29:42.878638170 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
history_hostname = "zol-devel"
history_internal_str = "func=1 mintxg=0 maxtxg=2473"
history_internal_name = "scan setup"
history_txg = 0x9a9
history_time = 0x5cc8a226
time = 0x5cc8a226 0x345ef45a
eid = 0x206
Apr 30 2019 19:29:43.038657548 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
history_hostname = "zol-devel"
history_internal_str = "errors=0"
history_internal_name = "scan done"
history_txg = 0x9aa
history_time = 0x5cc8a227
time = 0x5cc8a227 0x24dde0c
eid = 0x207
Apr 30 2019 19:29:43.038657548 sysevent.fs.zfs.scrub_finish
version = 0x0
class = "sysevent.fs.zfs.scrub_finish"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
time = 0x5cc8a227 0x24dde0c
eid = 0x208
Apr 30 2019 19:29:43.090663852 sysevent.fs.zfs.scrub_start
version = 0x0
class = "sysevent.fs.zfs.scrub_start"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
time = 0x5cc8a227 0x5676bac
eid = 0x209
Apr 30 2019 19:29:43.090663852 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
history_hostname = "zol-devel"
history_internal_str = "func=1 mintxg=0 maxtxg=2476"
history_internal_name = "scan setup"
history_txg = 0x9ac
history_time = 0x5cc8a227
time = 0x5cc8a227 0x5676bac
eid = 0x20a
Apr 30 2019 19:29:43.270685676 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
history_hostname = "zol-devel"
history_internal_str = "errors=0"
history_internal_name = "scan done"
history_txg = 0x9ad
history_time = 0x5cc8a227
time = 0x5cc8a227 0x102255ec
eid = 0x20b
Apr 30 2019 19:29:43.270685676 sysevent.fs.zfs.scrub_finish
version = 0x0
class = "sysevent.fs.zfs.scrub_finish"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
time = 0x5cc8a227 0x102255ec
eid = 0x20c
Apr 30 2019 19:29:43.322691981 sysevent.fs.zfs.scrub_start
version = 0x0
class = "sysevent.fs.zfs.scrub_start"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
time = 0x5cc8a227 0x133be38d
eid = 0x20d
Apr 30 2019 19:29:43.322691981 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
history_hostname = "zol-devel"
history_internal_str = "func=1 mintxg=0 maxtxg=2479"
history_internal_name = "scan setup"
history_txg = 0x9af
history_time = 0x5cc8a227
time = 0x5cc8a227 0x133be38d
eid = 0x20e
Apr 30 2019 19:29:43.490712349 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
history_hostname = "zol-devel"
history_internal_str = "errors=0"
history_internal_name = "scan done"
history_txg = 0x9b0
history_time = 0x5cc8a227
time = 0x5cc8a227 0x1d3fad1d
eid = 0x20f
Apr 30 2019 19:29:43.490712349 sysevent.fs.zfs.scrub_finish
version = 0x0
class = "sysevent.fs.zfs.scrub_finish"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
time = 0x5cc8a227 0x1d3fad1d
eid = 0x210
Apr 30 2019 19:29:43.554720109 sysevent.fs.zfs.scrub_start
version = 0x0
class = "sysevent.fs.zfs.scrub_start"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
time = 0x5cc8a227 0x21105b6d
eid = 0x211
Apr 30 2019 19:29:43.562721079 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
history_hostname = "zol-devel"
history_internal_str = "func=1 mintxg=0 maxtxg=2486"
history_internal_name = "scan setup"
history_txg = 0x9b6
history_time = 0x5cc8a227
time = 0x5cc8a227 0x218a7137
eid = 0x212
Apr 30 2019 19:29:43.742742902 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
history_hostname = "zol-devel"
history_internal_str = "errors=0"
history_internal_name = "scan done"
history_txg = 0x9b7
history_time = 0x5cc8a227
time = 0x5cc8a227 0x2c455b76
eid = 0x213
Apr 30 2019 19:29:43.742742902 sysevent.fs.zfs.scrub_finish
version = 0x0
class = "sysevent.fs.zfs.scrub_finish"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
time = 0x5cc8a227 0x2c455b76
eid = 0x214
Apr 30 2019 19:29:43.766745812 sysevent.fs.zfs.scrub_start
version = 0x0
class = "sysevent.fs.zfs.scrub_start"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
time = 0x5cc8a227 0x2db39cd4
eid = 0x215
Apr 30 2019 19:29:43.766745812 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
history_hostname = "zol-devel"
history_internal_str = "func=1 mintxg=0 maxtxg=2489"
history_internal_name = "scan setup"
history_txg = 0x9b9
history_time = 0x5cc8a227
time = 0x5cc8a227 0x2db39cd4
eid = 0x216
Apr 30 2019 19:29:43.958769091 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
history_hostname = "zol-devel"
history_internal_str = "errors=0"
history_internal_name = "scan done"
history_txg = 0x9ba
history_time = 0x5cc8a227
time = 0x5cc8a227 0x3925a7c3
eid = 0x217
Apr 30 2019 19:29:43.958769091 sysevent.fs.zfs.scrub_finish
version = 0x0
class = "sysevent.fs.zfs.scrub_finish"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
time = 0x5cc8a227 0x3925a7c3
eid = 0x218
Apr 30 2019 19:29:44.002774424 sysevent.fs.zfs.scrub_start
version = 0x0
class = "sysevent.fs.zfs.scrub_start"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
time = 0x5cc8a228 0x2a5598
eid = 0x219
Apr 30 2019 19:29:44.002774424 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
history_hostname = "zol-devel"
history_internal_str = "func=1 mintxg=0 maxtxg=2492"
history_internal_name = "scan setup"
history_txg = 0x9bc
history_time = 0x5cc8a228
time = 0x5cc8a228 0x2a5598
eid = 0x21a
Apr 30 2019 19:29:44.222801136 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
history_hostname = "zol-devel"
history_internal_str = "errors=0"
history_internal_name = "scan done"
history_txg = 0x9bd
history_time = 0x5cc8a228
time = 0x5cc8a228 0xd47acf0
eid = 0x21b
Apr 30 2019 19:29:44.222801136 sysevent.fs.zfs.scrub_finish
version = 0x0
class = "sysevent.fs.zfs.scrub_finish"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
time = 0x5cc8a228 0xd47acf0
eid = 0x21c
Apr 30 2019 19:29:44.250804535 sysevent.fs.zfs.scrub_start
version = 0x0
class = "sysevent.fs.zfs.scrub_start"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
time = 0x5cc8a228 0xef2f937
eid = 0x21d
Apr 30 2019 19:29:44.250804535 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
history_hostname = "zol-devel"
history_internal_str = "func=1 mintxg=0 maxtxg=2495"
history_internal_name = "scan setup"
history_txg = 0x9bf
history_time = 0x5cc8a228
time = 0x5cc8a228 0xef2f937
eid = 0x21e
Apr 30 2019 19:29:44.438827360 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
history_hostname = "zol-devel"
history_internal_str = "errors=0"
history_internal_name = "scan done"
history_txg = 0x9c0
history_time = 0x5cc8a228
time = 0x5cc8a228 0x1a27f960
eid = 0x21f
Apr 30 2019 19:29:44.438827360 sysevent.fs.zfs.scrub_finish
version = 0x0
class = "sysevent.fs.zfs.scrub_finish"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
time = 0x5cc8a228 0x1a27f960
eid = 0x220
Apr 30 2019 19:29:44.474831731 sysevent.fs.zfs.scrub_start
version = 0x0
class = "sysevent.fs.zfs.scrub_start"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
time = 0x5cc8a228 0x1c4d5b73
eid = 0x221
Apr 30 2019 19:29:44.474831731 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
history_hostname = "zol-devel"
history_internal_str = "func=1 mintxg=0 maxtxg=2498"
history_internal_name = "scan setup"
history_txg = 0x9c2
history_time = 0x5cc8a228
time = 0x5cc8a228 0x1c4d5b73
eid = 0x222
Apr 30 2019 19:29:44.662854556 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
history_hostname = "zol-devel"
history_internal_str = "errors=0"
history_internal_name = "scan done"
history_txg = 0x9c3
history_time = 0x5cc8a228
time = 0x5cc8a228 0x27825b9c
eid = 0x223
Apr 30 2019 19:29:44.662854556 sysevent.fs.zfs.scrub_finish
version = 0x0
class = "sysevent.fs.zfs.scrub_finish"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
time = 0x5cc8a228 0x27825b9c
eid = 0x224
Apr 30 2019 19:29:44.690857955 sysevent.fs.zfs.scrub_start
version = 0x0
class = "sysevent.fs.zfs.scrub_start"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
time = 0x5cc8a228 0x292da7e3
eid = 0x225
Apr 30 2019 19:29:44.690857955 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
history_hostname = "zol-devel"
history_internal_str = "func=1 mintxg=0 maxtxg=2501"
history_internal_name = "scan setup"
history_txg = 0x9c5
history_time = 0x5cc8a228
time = 0x5cc8a228 0x292da7e3
eid = 0x226
Apr 30 2019 19:29:44.874880295 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
history_hostname = "zol-devel"
history_internal_str = "errors=0"
history_internal_name = "scan done"
history_txg = 0x9c6
history_time = 0x5cc8a228
time = 0x5cc8a228 0x34259d27
eid = 0x227
Apr 30 2019 19:29:44.874880295 sysevent.fs.zfs.scrub_finish
version = 0x0
class = "sysevent.fs.zfs.scrub_finish"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
time = 0x5cc8a228 0x34259d27
eid = 0x228
Apr 30 2019 19:29:44.930887093 sysevent.fs.zfs.scrub_start
version = 0x0
class = "sysevent.fs.zfs.scrub_start"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
time = 0x5cc8a228 0x377c35b5
eid = 0x229
Apr 30 2019 19:29:44.930887093 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
history_hostname = "zol-devel"
history_internal_str = "func=1 mintxg=0 maxtxg=2506"
history_internal_name = "scan setup"
history_txg = 0x9ca
history_time = 0x5cc8a228
time = 0x5cc8a228 0x377c35b5
eid = 0x22a
Apr 30 2019 19:29:45.122910444 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
history_hostname = "zol-devel"
history_internal_str = "errors=0"
history_internal_name = "scan done"
history_txg = 0x9cb
history_time = 0x5cc8a229
time = 0x5cc8a229 0x75376ec
eid = 0x22b
Apr 30 2019 19:29:45.122910444 sysevent.fs.zfs.scrub_finish
version = 0x0
class = "sysevent.fs.zfs.scrub_finish"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
time = 0x5cc8a229 0x75376ec
eid = 0x22c
Apr 30 2019 19:29:45.170916288 sysevent.fs.zfs.scrub_start
version = 0x0
class = "sysevent.fs.zfs.scrub_start"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
time = 0x5cc8a229 0xa2ff9c0
eid = 0x22d
Apr 30 2019 19:29:45.170916288 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
history_hostname = "zol-devel"
history_internal_str = "func=1 mintxg=0 maxtxg=2509"
history_internal_name = "scan setup"
history_txg = 0x9cd
history_time = 0x5cc8a229
time = 0x5cc8a229 0xa2ff9c0
eid = 0x22e
Apr 30 2019 19:29:45.390943072 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
history_hostname = "zol-devel"
history_internal_str = "errors=0"
history_internal_name = "scan done"
history_txg = 0x9ce
history_time = 0x5cc8a229
time = 0x5cc8a229 0x174d5160
eid = 0x22f
Apr 30 2019 19:29:45.390943072 sysevent.fs.zfs.scrub_finish
version = 0x0
class = "sysevent.fs.zfs.scrub_finish"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
time = 0x5cc8a229 0x174d5160
eid = 0x230
Apr 30 2019 19:29:45.470952811 sysevent.fs.zfs.scrub_start
version = 0x0
class = "sysevent.fs.zfs.scrub_start"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
time = 0x5cc8a229 0x1c122b6b
eid = 0x231
Apr 30 2019 19:29:45.470952811 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
history_hostname = "zol-devel"
history_internal_str = "func=1 mintxg=0 maxtxg=2516"
history_internal_name = "scan setup"
history_txg = 0x9d4
history_time = 0x5cc8a229
time = 0x5cc8a229 0x1c122b6b
eid = 0x232
Apr 30 2019 19:29:45.670977160 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
history_hostname = "zol-devel"
history_internal_str = "errors=0"
history_internal_name = "scan done"
history_txg = 0x9d5
history_time = 0x5cc8a229
time = 0x5cc8a229 0x27fe4c88
eid = 0x233
Apr 30 2019 19:29:45.670977160 sysevent.fs.zfs.scrub_finish
version = 0x0
class = "sysevent.fs.zfs.scrub_finish"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
time = 0x5cc8a229 0x27fe4c88
eid = 0x234
Apr 30 2019 19:29:45.694980082 sysevent.fs.zfs.scrub_start
version = 0x0
class = "sysevent.fs.zfs.scrub_start"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
time = 0x5cc8a229 0x296c8df2
eid = 0x235
Apr 30 2019 19:29:45.694980082 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
history_hostname = "zol-devel"
history_internal_str = "func=1 mintxg=0 maxtxg=2519"
history_internal_name = "scan setup"
history_txg = 0x9d7
history_time = 0x5cc8a229
time = 0x5cc8a229 0x296c8df2
eid = 0x236
Apr 30 2019 19:29:45.883002970 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
history_hostname = "zol-devel"
history_internal_str = "errors=0"
history_internal_name = "scan done"
history_txg = 0x9d8
history_time = 0x5cc8a229
time = 0x5cc8a229 0x34a18e5a
eid = 0x237
Apr 30 2019 19:29:45.883002970 sysevent.fs.zfs.scrub_finish
version = 0x0
class = "sysevent.fs.zfs.scrub_finish"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
time = 0x5cc8a229 0x34a18e5a
eid = 0x238
Apr 30 2019 19:29:45.939009788 sysevent.fs.zfs.scrub_start
version = 0x0
class = "sysevent.fs.zfs.scrub_start"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
time = 0x5cc8a229 0x37f826fc
eid = 0x239
Apr 30 2019 19:29:45.939009788 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
history_hostname = "zol-devel"
history_internal_str = "func=1 mintxg=0 maxtxg=2525"
history_internal_name = "scan setup"
history_txg = 0x9dd
history_time = 0x5cc8a229
time = 0x5cc8a229 0x37f826fc
eid = 0x23a
Apr 30 2019 19:29:46.123032209 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
history_hostname = "zol-devel"
history_internal_str = "errors=0"
history_internal_name = "scan done"
history_txg = 0x9de
history_time = 0x5cc8a22a
time = 0x5cc8a22a 0x7555291
eid = 0x23b
Apr 30 2019 19:29:46.123032209 sysevent.fs.zfs.scrub_finish
version = 0x0
class = "sysevent.fs.zfs.scrub_finish"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
time = 0x5cc8a22a 0x7555291
eid = 0x23c
Apr 30 2019 19:29:46.151035622 sysevent.fs.zfs.scrub_start
version = 0x0
class = "sysevent.fs.zfs.scrub_start"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
time = 0x5cc8a22a 0x9009ee6
eid = 0x23d
Apr 30 2019 19:29:46.151035622 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
history_hostname = "zol-devel"
history_internal_str = "func=1 mintxg=0 maxtxg=2528"
history_internal_name = "scan setup"
history_txg = 0x9e0
history_time = 0x5cc8a22a
time = 0x5cc8a22a 0x9009ee6
eid = 0x23e
Apr 30 2019 19:29:46.335058054 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
history_hostname = "zol-devel"
history_internal_str = "errors=0"
history_internal_name = "scan done"
history_txg = 0x9e1
history_time = 0x5cc8a22a
time = 0x5cc8a22a 0x13f89486
eid = 0x23f
Apr 30 2019 19:29:46.335058054 sysevent.fs.zfs.scrub_finish
version = 0x0
class = "sysevent.fs.zfs.scrub_finish"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
time = 0x5cc8a22a 0x13f89486
eid = 0x240
Apr 30 2019 19:29:46.371062443 sysevent.fs.zfs.scrub_start
version = 0x0
class = "sysevent.fs.zfs.scrub_start"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
time = 0x5cc8a22a 0x161df6ab
eid = 0x241
Apr 30 2019 19:29:46.371062443 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
history_hostname = "zol-devel"
history_internal_str = "func=1 mintxg=0 maxtxg=2531"
history_internal_name = "scan setup"
history_txg = 0x9e3
history_time = 0x5cc8a22a
time = 0x5cc8a22a 0x161df6ab
eid = 0x242
Apr 30 2019 19:29:46.563085850 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
history_hostname = "zol-devel"
history_internal_str = "errors=0"
history_internal_name = "scan done"
history_txg = 0x9e4
history_time = 0x5cc8a22a
time = 0x5cc8a22a 0x2190021a
eid = 0x243
Apr 30 2019 19:29:46.563085850 sysevent.fs.zfs.scrub_finish
version = 0x0
class = "sysevent.fs.zfs.scrub_finish"
pool = "rtest"
pool_guid = 0x4888d3300934ecba
pool_state = 0x0
pool_context = 0x0
time = 0x5cc8a22a 0x2190021a
eid = 0x244
!?! every checksum error should also generate an ereport. Does your build include 2bbec1c91 ?
It seem the change is part of 0.8.x release, but the test system is 0.7.5. Will try to build a 0.8.x release and repeat the test
I've built 0.8.0-rc4. The same issue and there are no any related events.
I can reproduce @Ramzec results with 0.8.0-rc4 on an Ubuntu 18 VM. I also didn't see any checksum events:
hutter@ubuntu18:~/zfs$ ./cmd/zpool/zpool status
pool: rtest
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-9P
scan: scrub in progress since Thu Feb 21 20:18:09 2019
87.3M scanned at 87.3M/s, 22.2M issued at 22.2M/s, 88.5M total
0B repaired, 25.02% done, 0 days 00:00:02 to go
config:
NAME STATE READ WRITE CKSUM
rtest ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ram0 ONLINE 0 0 162
ram1 ONLINE 0 0 54
errors: No known data errors
hutter@ubuntu18:~/zfs$ ./cmd/zpool/zpool events
TIME CLASS
cannot get event: permission denied
hutter@ubuntu18:~/zfs$ sudo ./cmd/zpool/zpool events
[sudo] password for hutter:
TIME CLASS
dropped 285 events
Feb 21 2019 20:18:22.900000000 sysevent.fs.zfs.history_event
Feb 21 2019 20:18:22.900000000 sysevent.fs.zfs.scrub_finish
Feb 21 2019 20:18:22.924000000 sysevent.fs.zfs.scrub_start
Feb 21 2019 20:18:22.924000000 sysevent.fs.zfs.history_event
Feb 21 2019 20:18:23.036000000 sysevent.fs.zfs.history_event
Feb 21 2019 20:18:23.036000000 sysevent.fs.zfs.scrub_finish
Feb 21 2019 20:18:23.056000000 sysevent.fs.zfs.scrub_start
Feb 21 2019 20:18:23.056000000 sysevent.fs.zfs.history_event
Feb 21 2019 20:18:23.116000000 sysevent.fs.zfs.history_event
Feb 21 2019 20:18:23.116000000 sysevent.fs.zfs.scrub_finish
Feb 21 2019 20:18:23.140000000 sysevent.fs.zfs.scrub_start
Feb 21 2019 20:18:23.140000000 sysevent.fs.zfs.history_event
Feb 21 2019 20:18:23.244000000 sysevent.fs.zfs.history_event
Feb 21 2019 20:18:23.244000000 sysevent.fs.zfs.scrub_finish
Feb 21 2019 20:18:23.288000000 sysevent.fs.zfs.scrub_start
Feb 21 2019 20:18:23.292000000 sysevent.fs.zfs.history_event
Feb 21 2019 20:18:23.428000000 sysevent.fs.zfs.history_event
Feb 21 2019 20:18:23.428000000 sysevent.fs.zfs.scrub_finish
Feb 21 2019 20:18:23.468000000 sysevent.fs.zfs.scrub_start
Feb 21 2019 20:18:23.468000000 sysevent.fs.zfs.history_event
Feb 21 2019 20:18:23.572000000 sysevent.fs.zfs.history_event
Feb 21 2019 20:18:23.572000000 sysevent.fs.zfs.scrub_finish
Feb 21 2019 20:18:23.600000000 sysevent.fs.zfs.scrub_start
Feb 21 2019 20:18:23.600000000 sysevent.fs.zfs.history_event
Feb 21 2019 20:18:23.720000000 sysevent.fs.zfs.history_event
Feb 21 2019 20:18:23.720000000 sysevent.fs.zfs.scrub_finish
Feb 21 2019 20:18:23.764000000 sysevent.fs.zfs.scrub_start
Feb 21 2019 20:18:23.764000000 sysevent.fs.zfs.history_event
Feb 21 2019 20:18:23.908000000 sysevent.fs.zfs.history_event
...
Here is what is known by now: scrub might issue ZIO to right now attached disk, that causes the described CHKSUM errors. To solve this I implemented the following fix. With the fix ZFS will not start ZIO-scrub for a vdev, that has missing data
diff --git a/usr/src/uts/common/fs/zfs/vdev_mirror.c b/usr/src/uts/common/fs/zfs/vdev_mirror.c
index 4a869c9ea1..4843d2f7c8 100644
--- a/usr/src/uts/common/fs/zfs/vdev_mirror.c
+++ b/usr/src/uts/common/fs/zfs/vdev_mirror.c
@@ -445,6 +445,14 @@ vdev_mirror_io_start(zio_t *zio)
continue;
}
+ /*
+ * Possible the VDEV was attached in near past
+ * and resilvering is not finished yet for it,
+ * so we need to skip to avoid CHKSUM errors
+ */
+ if (vdev_dtl_contains(mc->mc_vd, DTL_MISSING, zio->io_txg, 1))
+ continue;
+
if (spec_case && c != target)
continue;
Also I found two code-blocks, where ZFS checks vdevs and if some of them is required resilvering. The first one switches code-path from SCRUB to RESILVER if a vdev is required resilvering. The another one doesn't do that switch. The following change fixes the issue:
diff --git a/usr/src/uts/common/fs/zfs/dsl_scan.c b/usr/src/uts/common/fs/zfs/dsl_scan.c
index f39afbb4cb..6950822846 100644
--- a/usr/src/uts/common/fs/zfs/dsl_scan.c
+++ b/usr/src/uts/common/fs/zfs/dsl_scan.c
@@ -470,6 +474,7 @@ dsl_scan_setup_sync(void *arg, dmu_tx_t *tx)
&scn->scn_phys.scn_min_txg, &scn->scn_phys.scn_max_txg)) {
spa_event_notify(spa, NULL, NULL,
ESC_ZFS_RESILVER_START);
+ scn->scn_phys.scn_func = POOL_SCAN_RESILVER;
} else {
spa_event_notify(spa, NULL, NULL, ESC_ZFS_SCRUB_START);
}
@@ -506,7 +511,8 @@ dsl_scan_setup_sync(void *arg, dmu_tx_t *tx)
spa_history_log_internal(spa, "scan setup", tx,
"func=%u mintxg=%llu maxtxg=%llu",
- *funcp, scn->scn_phys.scn_min_txg, scn->scn_phys.scn_max_txg);
+ scn->scn_phys.scn_func, scn->scn_phys.scn_min_txg,
+ scn->scn_phys.scn_max_txg);
}
/* ARGSUSED */
Unfortunately the changes didn't help. I still saw CHKSUM errors, but on another VDEV. Later I found that all CHKSUM-errors are related to blocks, that were created at the same TXG that is syncing now.
Finally I implemented the following solution:
diff --git a/usr/src/uts/common/fs/zfs/dsl_scan.c b/usr/src/uts/common/fs/zfs/dsl_scan.c
index f39afbb4cb..6950822846 100644
--- a/usr/src/uts/common/fs/zfs/dsl_scan.c
+++ b/usr/src/uts/common/fs/zfs/dsl_scan.c
@@ -428,6 +428,10 @@ dsl_scan_setup_check(void *arg, dmu_tx_t *tx)
if (dsl_scan_is_running(scn))
return (SET_ERROR(EBUSY));
+ if (dmu_tx_is_syncing(tx) && scn->scn_restart_txg != 0 &&
+ scn->scn_restart_txg > tx->tx_txg)
+ return (SET_ERROR(EAGAIN));
+
return (0);
}
With the change SCRUB started by a user will be rescheduled if a vdev is required resilvering.
With the last change my test VM was running more than 2 days without any CHKSUM errors.
Any idea, thoughts...
@tonyhutter pointed out that I may had a similar problem, first described on the mailing list. I am using CentOS 7.6 with kernel 3.10.0-957.12.1.el7.x86_64 and ZFS/SPL version 0.7.13-1
Yesterday I replaced a failed disk, with path pci-0000:02:00.1-ata-2.0. As this chassis in not hotplug, I powered off the machine and replaced the disk. On server restart, having autoreplace=on, zed immediately replaced the disk and a resilver started.
A zpool status issued just after booting shows 1.15K CKSUM error on the replaced disk - pci-0000:02:00.1-ata-2.0, which was marked as DEGRADED. This surprised me: the disk was brand new, with absolutely no zfs labels, so how/why zfs tried to access it?
A subsequent scrub found no error, but CKSUM remained at 1.15K until I issued a "zpool clear tank". It seems as if zfs tried to read from the replaced disk (during pool import?), found it "corrupted" (ie: it contains no valid data) and marked it as degraded.
I failed to replicate the issue in a VM, though; maybe it is timing sensitive (ie: there is a race condition)? Thanks.
I'm seeing very similar behavior since upgrading to 0.8.2-1. One of my zpools with <1 year old 10TB drives all reported checksum errors right after the upgrade. (No errors ever before 0.8.2-1.) I also was experiencing the restarting resilver issue. Doing some digging, I saw a tip about stopping zed. Once I stopped zed, all my problems went away.
` pool: tank state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Fri Oct 25 21:28:17 2019 72.8G scanned at 9.10G/s, 64.9G issued at 8.11G/s, 29.1T total 322M resilvered, 0.22% done, 0 days 01:01:13 to go config:
NAME STATE READ WRITE CKSUM
tank DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
ata-ST10000NM0086-2AA101_ZA28MD7E DEGRADED 0 0 415 too many errors
ata-ST10000NM0086-2AA101_ZA28MCNG DEGRADED 0 0 64 too many errors
ata-ST10000NM0086-2AA101_ZA25NSFA DEGRADED 0 0 271 too many errors
ata-ST10000NM0086-2AA101_ZA28MD0H DEGRADED 0 0 704 too many errors (resilvering)
ata-ST10000NM0086-2AA101_ZA26F2MG DEGRADED 0 0 34 too many errors (resilvering)
ata-ST10000NM0086-2AA101_ZA26E1X0 DEGRADED 0 0 52 too many errors (resilvering)
ata-ST10000NM0086-2AA101_ZA26E1SS DEGRADED 0 0 57 too many errors
ata-ST10000NM0086-2AA101_ZA28M62Z ONLINE 0 0 4
ata-ST10000NM0086-2AA101_ZA26DN39 DEGRADED 0 0 59 too many errors (resilvering)
ata-ST10000NM0086-2AA101_ZA26E2M3 DEGRADED 0 0 44 too many errors
` CentOS 7.7 kernel 3.10.0-1062.4.1.el7.x86_64.
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.
anti-stale-bot-post
Hi everyone.
I'm on Proxmox VE 6.3 with ZFS 0.8.5 and Kernel 5.4.73 and I think I hit the same or a similar Bug.
First i thought RAM or (brandnew) Disks are bad so i tested each component separately without any issue. So decided to test on a different server to eliminate possible errors related to the storage components but i still got errors. So i did more tests on both Servers with HDDs, SSDs and RAM Disks and all show the same problem.
~ # modprobe brd rd_nr=2 rd_size=10485760
~ # zpool create test2 mirror /dev/ram0 /dev/ram1
~ # zpool status
pool: test2
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
test2 ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ram0 ONLINE 0 0 0
ram1 ONLINE 0 0 0
errors: No known data errors
~ # zfs create -V 5G -o volblocksize=8k test2/testvol
~ # dd if=/dev/urandom of=/dev/zvol/test2/testvol bs=1M status=progress
~ # zpool status
pool: test2
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
test2 ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ram0 ONLINE 0 0 0
ram1 ONLINE 0 0 0
errors: No known data errors
~ # zpool scrub test2
~ # zpool detach test2 ram1
~ # zpool add test2 ram1
~ # zpool remove test2 ram0
~ # zpool attach test2 ram1 ram0
~ # zpool status
pool: test2
state: DEGRADED
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-9P
scan: resilvered 5.11G in 0 days 00:00:15 with 0 errors on Fri Nov 27 07:57:58 2020
remove: Removal of vdev 0 copied 5.05G in 0h0m, completed on Fri Nov 27 07:57:32 2020
10.2K memory used for removed device mappings
config:
NAME STATE READ WRITE CKSUM
test2 DEGRADED 0 0 0
mirror-1 DEGRADED 0 0 0
ram1 ONLINE 0 0 0
ram0 DEGRADED 0 0 18 too many errors
errors: No known data errors
Here are the zpool events from a third test where it happened on the second try or so (not sure anymore :) ). test2_ereport.txt
I may have seen the same issue under Rocky Linux 8.4 and ZFS 2.1.0
root@zfsscratch ~]# zpool replace scratch 36000402009e01d5d415ae51700000000 36000402002e022454a69ead900000000 cannot open '/dev/mapper/36000402002e022454a69ead900000000': Device or resource busy mirror-40 ONLINE 0 0 0 36000402002e022454a69908c00000000 ONLINE 0 0 0 spare-1 ONLINE 0 0 0 36000402009e01d5d415ae51700000000 ONLINE 2 0 1 36000402002e022454a69ead900000000 ONLINE 0 0 97 ... spares ... 36000402002e022454a69ead900000000 INUSE currently in use
The resilver finished and I am now running a zpool clear && zpool scrub.
Hi,
I experienced the same checksum error report after attaching a brand new disk to an existing vpool of a single disk. The system runs Debian 11 with ZFS 2.0.6.
An automatic scrub was running when I zpool attache'd the device. I did not stopped it manually as the zpool-attach(8)
manual says:
new_device begins to resilver immediately and any running scrub is cancelled.
The brand new disk is ata-HGST_HUS728T8TALE6L4_VRH4JTEK
. Here is the command I ran (from zpool history
):
2021-11-14.02:26:53 zpool attach tank ata-TOSHIBA_HDWG480_71Q0A07SFR0H /dev/disk/by-id/ata-HGST_HUS728T8TALE6L4_VRH4JTEK
I checked the status few minutes after attaching the device, it was already reporting checksum errors:
# zpool status tank
pool: tank
state: ONLINE
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Sun Nov 14 02:26:42 2021
1.25T scanned at 5.17G/s, 33.9G issued at 140M/s, 2.43T total
33.9G resilvered, 1.36% done, 04:59:16 to go
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-TOSHIBA_HDWG480_71Q0A07SFR0H ONLINE 0 0 0
ata-HGST_HUS728T8TALE6L4_VRH4JTEK ONLINE 0 0 2.72K (resilvering)
errors: No known data errors
The resilvering is now finished, the checksum errors did not change:
# zpool status
pool: tank
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
scan: resilvered 2.44T in 04:17:38 with 0 errors on Sun Nov 14 06:44:20 2021
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-TOSHIBA_HDWG480_71Q0A07SFR0H ONLINE 0 0 0
ata-HGST_HUS728T8TALE6L4_VRH4JTEK ONLINE 0 0 2.72K
errors: No known data errors
The events:
Nov 14 2021 00:24:01.940729556 sysevent.fs.zfs.scrub_start
version = 0x0
class = "sysevent.fs.zfs.scrub_start"
pool = "tank"
pool_guid = 0xbb90af7e6f5976c2
pool_state = 0x0
pool_context = 0x0
time = 0x61904911 0x381264d4
eid = 0xf
Nov 14 2021 00:24:01.940729556 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "tank"
pool_guid = 0xbb90af7e6f5976c2
pool_state = 0x0
pool_context = 0x0
history_hostname = "larco"
history_internal_str = "func=1 mintxg=0 maxtxg=307454"
history_internal_name = "scan setup"
history_txg = 0x4b0fe
history_time = 0x61904911
time = 0x61904911 0x381264d4
eid = 0x10
Nov 14 2021 02:26:35.458148864 sysevent.fs.zfs.vdev_attach
version = 0x0
class = "sysevent.fs.zfs.vdev_attach"
pool = "tank"
pool_guid = 0xbb90af7e6f5976c2
pool_state = 0x0
pool_context = 0x0
vdev_guid = 0xa35d7634b2634316
vdev_state = "ONLINE" (0x7)
vdev_path = "/dev/disk/by-id/ata-HGST_HUS728T8TALE6L4_VRH4JTEK-part1"
vdev_devid = "ata-HGST_HUS728T8TALE6L4_VRH4JTEK-part1"
time = 0x619065cb 0x1b4ecc00
eid = 0x11
Nov 14 2021 02:26:42.858314108 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "tank"
pool_guid = 0xbb90af7e6f5976c2
pool_state = 0x0
pool_context = 0x0
history_hostname = "larco"
history_internal_str = "errors=0"
history_internal_name = "scan aborted, restarting"
history_txg = 0x4b644
history_time = 0x619065d2
time = 0x619065d2 0x3328d57c
eid = 0x12
Nov 14 2021 02:26:42.858314108 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "tank"
pool_guid = 0xbb90af7e6f5976c2
pool_state = 0x0
pool_context = 0x0
history_hostname = "larco"
history_internal_str = "errors=0"
history_internal_name = "starting deferred resilver"
history_txg = 0x4b644
history_time = 0x619065d2
time = 0x619065d2 0x3328d57c
eid = 0x13
Nov 14 2021 02:26:42.858314108 sysevent.fs.zfs.resilver_start
version = 0x0
class = "sysevent.fs.zfs.resilver_start"
pool = "tank"
pool_guid = 0xbb90af7e6f5976c2
pool_state = 0x0
pool_context = 0x0
resilver_type = "healing"
time = 0x619065d2 0x3328d57c
history_hostname = "larco"
history_internal_str = "errors=0"
history_internal_name = "starting deferred resilver"
history_txg = 0x4b644
history_time = 0x619065d2
time = 0x619065d2 0x3328d57c
eid = 0x13
Nov 14 2021 02:26:42.858314108 sysevent.fs.zfs.resilver_start
version = 0x0
class = "sysevent.fs.zfs.resilver_start"
pool = "tank"
pool_guid = 0xbb90af7e6f5976c2
pool_state = 0x0
pool_context = 0x0
resilver_type = "healing"
time = 0x619065d2 0x3328d57c
Nov 14 2021 02:26:42.858314108 sysevent.fs.zfs.resilver_start
version = 0x0
class = "sysevent.fs.zfs.resilver_start"
pool = "tank"
pool_guid = 0xbb90af7e6f5976c2
pool_state = 0x0
pool_context = 0x0
resilver_type = "healing"
time = 0x619065d2 0x3328d57c
eid = 0x14
Nov 14 2021 02:26:42.858314108 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "tank"
pool_guid = 0xbb90af7e6f5976c2
pool_state = 0x0
eid = 0x14
Nov 14 2021 02:26:42.858314108 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "tank"
pool_guid = 0xbb90af7e6f5976c2
pool_state = 0x0
pool_context = 0x0
history_hostname = "larco"
history_internal_str = "func=2 mintxg=3 maxtxg=308804"
history_internal_name = "scan setup"
history_txg = 0x4b644
history_time = 0x619065d2
time = 0x619065d2 0x3328d57c
eid = 0x15
Nov 14 2021 02:26:46.318391389 sysevent.fs.zfs.config_sync
version = 0x0
class = "sysevent.fs.zfs.config_sync"
pool = "tank"
pool_guid = 0xbb90af7e6f5976c2
pool_state = 0x0
pool_context = 0x0
time = 0x619065d6 0x12fa445d
eid = 0x16
Nov 14 2021 02:26:53.542552777 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "tank"
pool_guid = 0xbb90af7e6f5976c2
pool_state = 0x0
pool_context = 0x0
history_hostname = "larco"
history_internal_str = "attach vdev=/dev/disk/by-id/ata-HGST_HUS728T8TALE6L4_VRH4JTEK-part1 to vdev=/dev/disk/by-id/ata-TOSHIBA_HDWG480_71Q0A07SFR0H-part1"
history_internal_name = "vdev attach"
history_txg = 0x4b646
history_time = 0x619065dd
time = 0x619065dd 0x2056b2c9
eid = 0x17
Nov 14 2021 06:44:20.973882171 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "tank"
pool_guid = 0xbb90af7e6f5976c2
pool_state = 0x0
pool_context = 0x0
history_hostname = "larco"
history_internal_str = "errors=0"
history_internal_name = "scan done"
history_txg = 0x4c11f
history_time = 0x6190a234
time = 0x6190a234 0x3a0c433b
eid = 0x18
Nov 14 2021 06:44:20.973882171 sysevent.fs.zfs.resilver_finish
version = 0x0
class = "sysevent.fs.zfs.resilver_finish"
pool = "tank"
pool_guid = 0xbb90af7e6f5976c2
pool_state = 0x0
pool_context = 0x0
resilver_type = "healing"
time = 0x6190a234 0x3a0c433b
eid = 0x19
For the record, I clear'ed then scrub'ed my pool. No error reported:
# zpool status
pool: tank
state: ONLINE
scan: scrub repaired 0B in 04:02:52 with 0 errors on Mon Nov 15 04:13:53 2021
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-TOSHIBA_HDWG480_71Q0A07SFR0H ONLINE 0 0 0
ata-HGST_HUS728T8TALE6L4_VRH4JTEK ONLINE 0 0 0
errors: No known data errors
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.
I too am experiencing this error. A vpool containing one disk has another disk attached to it (mirror), and both resilvering and scrubbing causes checksum errors on the new disk. I can repeat the scrub numerous times and it always finds new checksum errors every scan.
I've reseated the drive onto the backplane a few times with no change. SMART data shows absolutely nothing wrong with either disk. Previous disks connected to this backplane slot have never caused errors. Running Debian 11, ZFS 2.1.4.
pool: tank
state: DEGRADED
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
scan: scrub repaired 16K in 00:22:01 with 0 errors on Fri Jan 6 00:07:10 2023
config:
NAME STATE READ WRITE CKSUM
tank DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
wwn-0x55cd2e404c47175f-part3 ONLINE 0 0 0
wwn-0x55cd2e404c471471-part3 DEGRADED 0 0 166 too many errors
errors: No known data errors
@c89williams rather than re-seating the disk, try zpool clear tank; zpool scrub tank
. If, after the scrub, no error are shown, then your disk is ok.
@shodanshok I have tried as you stated, clearing the error counters and rescrubbing, however checksum errors continue to occur.
@c89williams in this case, I don't think you have the same issue as described here. Have you tried rebooting your machine and re-scrubbing the pool?