zfs icon indicating copy to clipboard operation
zfs copied to clipboard

Removing mirrored special device from single vdev cause irrecoverable kernel panic

Open 3nprob opened this issue 4 years ago • 14 comments

System information

Type Version/Name
Distribution Name Debian
Distribution Version 11
Linux Kernel 5.9.0-1
Architecture amd64
ZFS Version 0.8.5-2
SPL Version 0.8.5-2

Describe the problem you're observing

Given a single vdev zpool, I added a mirrored special device:

# zpool status -v
  pool: tank
 state: ONLINE
  scan: none requested
config:
 
        NAME                                                STATE     READ WRITE CKSUM
        tank                                                ONLINE       0     0     0
          ata-Micron_5210_XXXXXXXXXXXXX_XXXXXXXXXXXX-part1  ONLINE       0     0     0
        special
          mirror-1                                          ONLINE       0     0     0
            ata-CT240BX500SSD1_XXXXXXXXXXXX                 ONLINE       0     0     0
            ata-CT240BX500SSD1_YYYYYYYYYYYY                 ONLINE       0     0     0

This worked fine. After this I tried to remove it:

# zpool remove tank mirror-1 -n
Memory that will be used after removing mirror-1: 12.7K

# zpool remove tank mirror-1

Message from syslogd@localhost at Nov  9 13:24:29 ...
 kernel:[ 4583.208631] VERIFY3(DVA_GET_ASIZE(&dst) == size) failed (102400 == 101376)

Message from syslogd@localhost at Nov  9 13:24:29 ...
 kernel:[ 4583.208665] PANIC at vdev_removal.c:1039:spa_vdev_copy_segment()
[  +0.000003]  ret_from_fork+0x22/0x30
[Nov 9 13:26] INFO: task txg_quiesce:1930 blocked for more than 120 seconds.
[  +0.000034]       Tainted: P           OE     5.9.0-1-amd64 #1 Debian 5.9.1-1
[  +0.000024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  +0.000027] task:txg_quiesce     state:D stack:    0 pid: 1930 ppid:     2 flags:0x00004000
[  +0.000003] Call Trace:
[  +0.000007]  __schedule+0x281/0x8a0
[  +0.000003]  schedule+0x4a/0xb0
[  +0.000007]  cv_wait_common+0xf8/0x130 [spl]
[  +0.000004]  ? add_wait_queue_exclusive+0x70/0x70
[  +0.000059]  txg_quiesce_thread+0x2b1/0x3a0 [zfs]
[  +0.000053]  ? txg_sync_thread+0x490/0x490 [zfs]
[  +0.000007]  thread_generic_wrapper+0x6f/0x80 [spl]
[  +0.000005]  ? __thread_exit+0x20/0x20 [spl]
[  +0.000002]  kthread+0x11b/0x140
[  +0.000001]  ? __kthread_bind_mask+0x60/0x60
[  +0.000003]  ret_from_fork+0x22/0x30
[this repeats]

Performing a reboot makes the boot hang with the above. Same if trying to import the zpool again.

3nprob avatar Nov 09 '20 18:11 3nprob

Just a first guess: Do your special and regular vdevs have different sector sizes (ashift)? Maybe there's a bug in how we check if device removal is allowed. We should not be allowing removal if it would require moving blocks between devices with different ashift. cc @don-brady

If they do have different ashift's, the removal should fail due to this code in spa_vdev_remove_top_check():

	/*
	 * A removed special/dedup vdev must have same ashift as normal class.
	 */
	ASSERT(!vd->vdev_islog);
	if (vd->vdev_alloc_bias != VDEV_BIAS_NONE &&
	    vd->vdev_ashift != spa->spa_max_ashift) {
		return (SET_ERROR(EINVAL));
	}

ahrens avatar Nov 09 '20 19:11 ahrens

@ahrens I didn't realize they differed but you're completely right; regular is 12 and special are 9.

I was warned when trying to create the pool that I couldn't add the special because of mismatched replication level (regular is single, special is mirror), which prompted me to use the -f flag. I guess it would normally have warned me of the mismatching ashift on adding it, but that this was silenced by the geometry warning?

Maybe there's a separate minor issue in that it shouldn't require -f to add a mirror special to a single device zpool?

Given the situation now, do you think there's any hope I will be able to restore the zpool at least to a state where I can get the data off it or is this now just a lesson to always to backups before zpool operations?

3nprob avatar Nov 10 '20 02:11 3nprob

@3nprob You can try this: zdb -lu /dev/disk/by-id/your_disk_or_partition | grep txg then you need to find ublock with timecode_Before_failed_action. Be careful, output is not sorted by time. then: echo "1" > /sys/module/zfs/parameters/zfs_recover echo "2" > /sys/module/zfs/parameters/zfs_max_missing_tvds zpool import -FT number_of_last_good_uberblock_from_zdb_lu -m -o readonly=on -d /dev/disk/by-id some_GUID

I'm not sure whose guid to use, the entire pool, or (in your case) ata-Micron_5210_XXXXXXXXXXXXX_XXXXXXXXXXXX-part1. So, Good luck.

If you are lucky and the import has started, then 99.99% everything will be fine. I mean, you CAN get the data out of there. But keep in mind that after the start of the import there will be a complete "pool scrub". In my case, 3.2TB of data was checked for 8 hours. So - patience and no panic.

KosmiK2001 avatar Nov 11 '20 22:11 KosmiK2001

@KosmiK2001 Thank you for the input. I am here taking a superblock I can find before attempted removal, but that I also find matching when checking part 2 on both one of the new special disks and the Micron_5210.

When trying the import:

# zpool import -FT 29 -m -o readonly=on -d /dev/disk/by-id/ata-Micron_5210_XXX_YYY-part1
   pool: tank
     id: 1234
  state: FAULTED
 status: The pool metadata is corrupted.
 action: The pool cannot be imported due to damaged devices or data.
        The pool may be active on another system, but can be imported using
        the '-f' flag.
   see: http://zfsonlinux.org/msg/ZFS-8000-72
 config:

        tank                                                FAULTED  corrupted data
          ata-Micron_5210_XXX-YYY-part1  ONLINE

Did I get that right? I also tried zpool import -f -FT 29 -m -o readonly=on -d /dev/disk/by-id/ata-Micron_5210_XXX_YYY-part1, with the same output.

3nprob avatar Nov 12 '20 05:11 3nprob

Oops .. i was a little wrong .. zpool import pool: zSSD id: 5120936506406348611 state: ONLINE action: The pool can be imported using its name or numeric identifier. config:

zSSD        ONLINE
  sdi4      ONLINE

zdb -lu /dev/sdi4

alot of Uberblock[31] magic = 0000000000bab10c version = 5000 txg = 228543 guid_sum = 5324284886383883725 timestamp = 1605090316 UTC = Wed Nov 11 13:25:16 2020 mmp_magic = 00000000a11cea11 mmp_delay = 0 mmp_valid = 0 checkpoint_txg = 0 labels = 0 1 2 3 You NEED a txg = xxxxxxx TXG, not number of uberblock!!

zpool import -FT 229136 -m -o readonly=on -d /dev/disk/by-id 5120936506406348611

KosmiK2001 avatar Nov 12 '20 08:11 KosmiK2001

# zpool import
   pool: tank
     id: 123456789
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:

        tank                                                ONLINE
          ata-Micron_5210_XXXXX_YYYY-part1  ONLINE
        special
          mirror-1                                          ONLINE
            ata-CT240BX500SSD1_DEAD                 ONLINE
            ata-CT240BX500SSD1_BEEF                 ONLINE
# zdb -lu /dev/disk/by-id/ata-CT240BX500SSD1_DEAD-part1 | grep '\[28\]' -A10
    Uberblock[28]
        magic = 0000000000bab10c
        version = 5000
        txg = 37660
        guid_sum = 15835402956303222107
        timestamp = 1604927937 UTC = Mon Nov  9 22:18:57 2020
        mmp_magic = 00000000a11cea11
        mmp_delay = 0
        mmp_valid = 0
        checkpoint_txg = 0
        labels = 0 1 2 3

# zpool import -f -FT 37660 -m -o readonly=on -d /dev/disk/by-id 5823545018697606436
cannot import 'king': I/O error
        Destroy and re-create the pool from
        a backup source.

# zpool import -f -FT 37660 -m -o readonly=on -d /dev/disk/by-id/ata-Micron_5210_XXXXX_YYYY-part1 123456789
cannot import 'tank': one or more devices is currently unavailable

(All three devices are connected)

3nprob avatar Nov 12 '20 12:11 3nprob

Use zdb -lu on ata-Micron_5210_XXXXX_YYYY-part1

KosmiK2001 avatar Nov 12 '20 12:11 KosmiK2001

btw, state: ONLINE. try zpool import -d /dev/disk/by-id 123456789 -R /somewhere

KosmiK2001 avatar Nov 12 '20 12:11 KosmiK2001

The txg was visible on both the special vdevs and the main, so same there. Doing the -R made it mount actually!!! Thanks a bunch @KosmiK2001 , fingers crossed I can actually copy off the 4TB as well <3

3nprob avatar Nov 12 '20 12:11 3nprob

@3nprob, Well what can I say, you are a very lucky guy! And now, take out the data! And don't shoot yourself in the knee again.

KosmiK2001 avatar Nov 12 '20 12:11 KosmiK2001

@ahrens

If they do have different ashift's, the removal should fail

But why? This doesn't look good to leave a user without an option to remove vdev which would be possible otherwise. He may have expected and rely on an option to remove it. He may have added the device with a wrong ashift by mistake. It should probably fail to add it on the first place or a HUGE WARNING should be shown informing the user that the devise will stuck in the pool forever.

@KosmiK2001

don't shoot yourself in the knee again

It doesn't look like he did something that is known to be bad. He just fell in a trap.

SSDpreowner avatar Nov 24 '20 05:11 SSDpreowner

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Nov 24 '21 06:11 stale[bot]

is this fixed?

3nprob avatar Nov 24 '21 06:11 3nprob

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Nov 26 '22 21:11 stale[bot]

is this fixed?

3nprob avatar Mar 17 '23 05:03 3nprob