zfs icon indicating copy to clipboard operation
zfs copied to clipboard

Zvols get lost upon host startup and VM shut down

Open ChristophSchmidpeter opened this issue 3 years ago • 4 comments

System information

Type Version/Name
Distribution Name Arch Linux
Distribution Version N.A.
Kernel Version 5.14.2-zen
Architecture x64
OpenZFS Version 2.1.1

Describe the problem you're observing

The zvols get missing randomly upon startup and virtual machine shutdown.

Describe how to reproduce the problem

  • Start Linux or shut down VM a few times (reproducible ~50% of startups/VM shut downs)
  • Notice that zvol devices are missing under /dev (but its partitions are still there)

Include any warning/errors/backtraces from the system logs

ChristophSchmidpeter avatar Nov 01 '21 14:11 ChristophSchmidpeter

It seems to be related to-, but distinct from #12507 (because the reproduction is different and happens much more often).

ChristophSchmidpeter avatar Nov 01 '21 14:11 ChristophSchmidpeter

I think that I've encountered the same bug on Fedora 35 w/ zfs-2.1.1. I've created a thread for discussing this problem here:

https://zfsonlinux.topicbox.com/groups/zfs-discuss/T32674c8840d47c60/zfs-2-1-1-zvol-links-disappearing-upon-vm-shutdown

Some of the text below has been pasted from one of my forum posts. Also, I've written a script which restores zvol links after they disappear. I regard this script as a workaround until a proper fix can be developed and released. See gzip'd attachment w/ script:

fix-zvol-links.gz

When shutting down a VM, I've found that links under /dev/zvol/POOLNAME will sometimes disappear. (The actual /dev/zdX device does not get lost / go missing.) You stand a very good chance of reproducing the problem if you have a lot of virtual disks (in the form of zvols) assigned to the VM. In order to try to understand this problem better, I made a VM running Fedora 35. I assigned a total of 5 virtual disks to this VM, one for the OS and four others named D1, D2, D3, and D4. I gave each of D1..D4 a GPT partition table and partitioned them to have just one partition. During my testing, I did nothing more than start the VM, log in via ssh, and then run "shutdown -h now". At each step along the way, I'd look at the state of the zvol links on the virtualization host. This is what I saw...

This is what the zvol links (should) look like on the virtualization host:

[root@ocotillo t2]# ls -l
total 0
lrwxrwxrwx 1 root root 14 Nov 13 18:09 D1 -> ../../../zd528
lrwxrwxrwx 1 root root 16 Nov 13 18:09 D1-part1 -> ../../../zd528p1
lrwxrwxrwx 1 root root 14 Nov 13 18:04 D2 -> ../../../zd544
lrwxrwxrwx 1 root root 16 Nov 13 18:04 D2-part1 -> ../../../zd544p1
lrwxrwxrwx 1 root root 14 Nov 13 18:04 D3 -> ../../../zd560
lrwxrwxrwx 1 root root 16 Nov 13 18:04 D3-part1 -> ../../../zd560p1
lrwxrwxrwx 1 root root 14 Nov 13 18:04 D4 -> ../../../zd576
lrwxrwxrwx 1 root root 16 Nov 13 18:04 D4-part1 -> ../../../zd576p1
lrwxrwxrwx 1 root root 14 Nov 13 18:04 f35-s2 -> ../../../zd608
lrwxrwxrwx 1 root root 16 Nov 13 18:04 f35-s2-part1 -> ../../../zd608p1
lrwxrwxrwx 1 root root 16 Nov 13 18:04 f35-s2-part2 -> ../../../zd608p2
lrwxrwxrwx 1 root root 16 Nov 13 18:04 f35-s2-part3 -> ../../../zd608p3

After starting the machine, nothing changes, so I won't show that. But after shutting it down, I see:

lrwxrwxrwx 1 root root 14 Nov 13 18:15 D1 -> ../../../zd528
lrwxrwxrwx 1 root root 16 Nov 13 18:15 D1-part1 -> ../../../zd528p1
lrwxrwxrwx 1 root root 14 Nov 13 18:15 D2 -> ../../../zd544
lrwxrwxrwx 1 root root 16 Nov 13 18:15 D2-part1 -> ../../../zd544p1
lrwxrwxrwx 1 root root 14 Nov 13 18:15 D3 -> ../../../zd560
lrwxrwxrwx 1 root root 16 Nov 13 18:15 D3-part1 -> ../../../zd560p1
lrwxrwxrwx 1 root root 16 Nov 13 18:15 D4-part1 -> ../../../zd576p1
lrwxrwxrwx 1 root root 14 Nov 13 18:15 f35-s2 -> ../../../zd608
lrwxrwxrwx 1 root root 16 Nov 13 18:15 f35-s2-part1 -> ../../../zd608p1
lrwxrwxrwx 1 root root 16 Nov 13 18:15 f35-s2-part2 -> ../../../zd608p2
lrwxrwxrwx 1 root root 16 Nov 13 18:15 f35-s2-part3 -> ../../../zd608p3

Note that the link for D4 is missing.  When I attempt to power up the VM, this message pops up:

Error starting domain: Cannot access storage file '/dev/zvol/puddle/t2/D4': No such file or directory

Of course, after running fix-zvol-links (see attachment),  I can power on the VM again.

I've conducted a number of trials similar to what's shown above. Sometimes none of the links are missing after shutdown. On some occasions all of the links were missing! When just a few are missing, it seems pretty random which are missing and which aren't.

I haven't yet seen any links disappear when starting a VM; for me it only happens upon VM shutdown.

Also, I've seen what may be a related problem involving zfs rename, but I'll file a separate bug report for that problem.

KevinBuettner avatar Nov 15 '21 03:11 KevinBuettner

I have not been able to reproduce this problem after applying the commit from https://github.com/openzfs/zfs/pull/12759.

KevinBuettner avatar Nov 24 '21 22:11 KevinBuettner

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Nov 26 '22 21:11 stale[bot]