zfs
zfs copied to clipboard
Removing mirrored special device from single vdev cause irrecoverable kernel panic
System information
| Type | Version/Name |
|---|---|
| Distribution Name | Debian |
| Distribution Version | 11 |
| Linux Kernel | 5.9.0-1 |
| Architecture | amd64 |
| ZFS Version | 0.8.5-2 |
| SPL Version | 0.8.5-2 |
Describe the problem you're observing
Given a single vdev zpool, I added a mirrored special device:
# zpool status -v
pool: tank
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
ata-Micron_5210_XXXXXXXXXXXXX_XXXXXXXXXXXX-part1 ONLINE 0 0 0
special
mirror-1 ONLINE 0 0 0
ata-CT240BX500SSD1_XXXXXXXXXXXX ONLINE 0 0 0
ata-CT240BX500SSD1_YYYYYYYYYYYY ONLINE 0 0 0
This worked fine. After this I tried to remove it:
# zpool remove tank mirror-1 -n
Memory that will be used after removing mirror-1: 12.7K
# zpool remove tank mirror-1
Message from syslogd@localhost at Nov 9 13:24:29 ...
kernel:[ 4583.208631] VERIFY3(DVA_GET_ASIZE(&dst) == size) failed (102400 == 101376)
Message from syslogd@localhost at Nov 9 13:24:29 ...
kernel:[ 4583.208665] PANIC at vdev_removal.c:1039:spa_vdev_copy_segment()
[ +0.000003] ret_from_fork+0x22/0x30
[Nov 9 13:26] INFO: task txg_quiesce:1930 blocked for more than 120 seconds.
[ +0.000034] Tainted: P OE 5.9.0-1-amd64 #1 Debian 5.9.1-1
[ +0.000024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ +0.000027] task:txg_quiesce state:D stack: 0 pid: 1930 ppid: 2 flags:0x00004000
[ +0.000003] Call Trace:
[ +0.000007] __schedule+0x281/0x8a0
[ +0.000003] schedule+0x4a/0xb0
[ +0.000007] cv_wait_common+0xf8/0x130 [spl]
[ +0.000004] ? add_wait_queue_exclusive+0x70/0x70
[ +0.000059] txg_quiesce_thread+0x2b1/0x3a0 [zfs]
[ +0.000053] ? txg_sync_thread+0x490/0x490 [zfs]
[ +0.000007] thread_generic_wrapper+0x6f/0x80 [spl]
[ +0.000005] ? __thread_exit+0x20/0x20 [spl]
[ +0.000002] kthread+0x11b/0x140
[ +0.000001] ? __kthread_bind_mask+0x60/0x60
[ +0.000003] ret_from_fork+0x22/0x30
[this repeats]
Performing a reboot makes the boot hang with the above. Same if trying to import the zpool again.
Just a first guess: Do your special and regular vdevs have different sector sizes (ashift)? Maybe there's a bug in how we check if device removal is allowed. We should not be allowing removal if it would require moving blocks between devices with different ashift. cc @don-brady
If they do have different ashift's, the removal should fail due to this code in spa_vdev_remove_top_check():
/*
* A removed special/dedup vdev must have same ashift as normal class.
*/
ASSERT(!vd->vdev_islog);
if (vd->vdev_alloc_bias != VDEV_BIAS_NONE &&
vd->vdev_ashift != spa->spa_max_ashift) {
return (SET_ERROR(EINVAL));
}
@ahrens I didn't realize they differed but you're completely right; regular is 12 and special are 9.
I was warned when trying to create the pool that I couldn't add the special because of mismatched replication level (regular is single, special is mirror), which prompted me to use the -f flag. I guess it would normally have warned me of the mismatching ashift on adding it, but that this was silenced by the geometry warning?
Maybe there's a separate minor issue in that it shouldn't require -f to add a mirror special to a single device zpool?
Given the situation now, do you think there's any hope I will be able to restore the zpool at least to a state where I can get the data off it or is this now just a lesson to always to backups before zpool operations?
@3nprob You can try this: zdb -lu /dev/disk/by-id/your_disk_or_partition | grep txg then you need to find ublock with timecode_Before_failed_action. Be careful, output is not sorted by time. then: echo "1" > /sys/module/zfs/parameters/zfs_recover echo "2" > /sys/module/zfs/parameters/zfs_max_missing_tvds zpool import -FT number_of_last_good_uberblock_from_zdb_lu -m -o readonly=on -d /dev/disk/by-id some_GUID
I'm not sure whose guid to use, the entire pool, or (in your case) ata-Micron_5210_XXXXXXXXXXXXX_XXXXXXXXXXXX-part1. So, Good luck.
If you are lucky and the import has started, then 99.99% everything will be fine. I mean, you CAN get the data out of there. But keep in mind that after the start of the import there will be a complete "pool scrub". In my case, 3.2TB of data was checked for 8 hours. So - patience and no panic.
@KosmiK2001 Thank you for the input. I am here taking a superblock I can find before attempted removal, but that I also find matching when checking part 2 on both one of the new special disks and the Micron_5210.
When trying the import:
# zpool import -FT 29 -m -o readonly=on -d /dev/disk/by-id/ata-Micron_5210_XXX_YYY-part1
pool: tank
id: 1234
state: FAULTED
status: The pool metadata is corrupted.
action: The pool cannot be imported due to damaged devices or data.
The pool may be active on another system, but can be imported using
the '-f' flag.
see: http://zfsonlinux.org/msg/ZFS-8000-72
config:
tank FAULTED corrupted data
ata-Micron_5210_XXX-YYY-part1 ONLINE
Did I get that right? I also tried zpool import -f -FT 29 -m -o readonly=on -d /dev/disk/by-id/ata-Micron_5210_XXX_YYY-part1, with the same output.
Oops .. i was a little wrong .. zpool import pool: zSSD id: 5120936506406348611 state: ONLINE action: The pool can be imported using its name or numeric identifier. config:
zSSD ONLINE
sdi4 ONLINE
zdb -lu /dev/sdi4
alot of Uberblock[31] magic = 0000000000bab10c version = 5000 txg = 228543 guid_sum = 5324284886383883725 timestamp = 1605090316 UTC = Wed Nov 11 13:25:16 2020 mmp_magic = 00000000a11cea11 mmp_delay = 0 mmp_valid = 0 checkpoint_txg = 0 labels = 0 1 2 3 You NEED a txg = xxxxxxx TXG, not number of uberblock!!
zpool import -FT 229136 -m -o readonly=on -d /dev/disk/by-id 5120936506406348611
# zpool import
pool: tank
id: 123456789
state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:
tank ONLINE
ata-Micron_5210_XXXXX_YYYY-part1 ONLINE
special
mirror-1 ONLINE
ata-CT240BX500SSD1_DEAD ONLINE
ata-CT240BX500SSD1_BEEF ONLINE
# zdb -lu /dev/disk/by-id/ata-CT240BX500SSD1_DEAD-part1 | grep '\[28\]' -A10
Uberblock[28]
magic = 0000000000bab10c
version = 5000
txg = 37660
guid_sum = 15835402956303222107
timestamp = 1604927937 UTC = Mon Nov 9 22:18:57 2020
mmp_magic = 00000000a11cea11
mmp_delay = 0
mmp_valid = 0
checkpoint_txg = 0
labels = 0 1 2 3
# zpool import -f -FT 37660 -m -o readonly=on -d /dev/disk/by-id 5823545018697606436
cannot import 'king': I/O error
Destroy and re-create the pool from
a backup source.
# zpool import -f -FT 37660 -m -o readonly=on -d /dev/disk/by-id/ata-Micron_5210_XXXXX_YYYY-part1 123456789
cannot import 'tank': one or more devices is currently unavailable
(All three devices are connected)
Use zdb -lu on ata-Micron_5210_XXXXX_YYYY-part1
btw, state: ONLINE. try zpool import -d /dev/disk/by-id 123456789 -R /somewhere
The txg was visible on both the special vdevs and the main, so same there.
Doing the -R made it mount actually!!! Thanks a bunch @KosmiK2001 , fingers crossed I can actually copy off the 4TB as well <3
@3nprob, Well what can I say, you are a very lucky guy! And now, take out the data! And don't shoot yourself in the knee again.
@ahrens
If they do have different ashift's, the removal should fail
But why? This doesn't look good to leave a user without an option to remove vdev which would be possible otherwise. He may have expected and rely on an option to remove it. He may have added the device with a wrong ashift by mistake. It should probably fail to add it on the first place or a HUGE WARNING should be shown informing the user that the devise will stuck in the pool forever.
@KosmiK2001
don't shoot yourself in the knee again
It doesn't look like he did something that is known to be bad. He just fell in a trap.
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.
is this fixed?
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.
is this fixed?