Creating concurrent snapshots / trackers of multiple block devices terminates the first one uncleanly
Distribution
Debian 12
Architecture
amd64
Kernel version
Linux 6.1.0-32-amd #1 SMP PREEMPT_DYNAMIC Debian 6.1.129-1 (2025-03-06) x86_64 GNU/Linux
Blksnap version
VAL-6.1 branch (6.1.2.1781) and also VAL-6.3 branch (6.3.0.73)
Bug description
When creating concurrent snapshots/trackers of different block devices, creating the second one, causes the first one to be cleaned up in a kind of unclean way.
Steps to reproduce
- Ordinary GPT disk with EXT4 partitions-
sda (sda1, sda2, sda3) - Create tracker1 + snapshot1 for first block device
-
ID=$(blksnap snapshot_create --device /dev/sda1 | grep ...) -
blksnap stretch_snapshot --id $ID --path /.some-temp-path-snapshot1 --limit 1024 & -
blksnap snapshot_take --id $ID -
blksnap snapshot_collect --id $ID# find maj:min device for the snapshot
-
- Create tracker2 + snapshot2 for second block device
- Same 4 commands but for e.g. /dev/sda2
- Free snapshot2 + tracker2 for second block device
-
blksnap snapshot_destroy $ID - kill the stretch_snapshot $PID...
- wait for the stretch_snapshot $PID...
-
blksnap tracker_remove --device /dev/sda2 -
rm /.some-temp-path-snapshot2* || true
-
- Free snapshot1 + tracker1 for first block device
- Same steps
Expected behavior
The snapshot/tracker for both devices should be safely removed.
But what happens instead is: After creating the 2nd device's snapshot/tracker,
- the first snapshot disappears
- the first stretch_snapshot PID exits
- the snapshot_collect output includes only the snapshot from the 2nd device, not the first
- The first tracker cannot be removed. It is stuck in some
device busyerror. - The first block device snapshotted cannot be snapshotted again.
There are these errors in dmesg:
Failed to copy data to diff storage with error 4
Unable to destory snapshot: cannot find snapshot by id ...
Removing device [8:1] from tracking
Tracker for device [8:1] is busy with a snapshot
Additional information
I'm happy to provide any other wanted information,
When the first snapshot is "damaged" in this way (some kind of unclean shutdown), an rmmod blksnap also hangs indefinitely.
Hi.
Did I understand you correctly that device /dev/sda1 belongs to the first snapshot, and device /dev/sda2 belongs to the second? But when you delete the second snapshot, do you remove the /dev/sda1 device from tracking? I think it should be /dev/sda2.
In any case, I'll try to do a test for both scenarios. Even wrong actions should not lead to hangups.
Thanks for the feedback.
Did I understand you correctly that device /dev/sda1 belongs to the first snapshot, and device /dev/sda2 belongs to the second? But when you delete the second snapshot, do you remove the /dev/sda1 device from tracking? I think it should be /dev/sda2.
Sorry, of course it's /dev/sda2. I updated the post to be more clear.
I confirm this issue. Test Two snapshots. This is not a problem for the VAL project, as it does not allow to create more than one snapshot. I plan to fix the problem in one of the next releases.
I found problem.
diff --git a/module/snapshot.c b/module/snapshot.c
index 2b59fcd..d6aaef4 100644
--- a/module/snapshot.c
+++ b/module/snapshot.c
@@ -413,7 +413,7 @@ int snapshot_create(struct blk_snap_dev_t *dev_id_array, unsigned int count,
}
down_write(&snapshots_lock);
- list_add_tail(&snapshots, &snapshot->link);
+ list_add_tail(&snapshot->link, &snapshots);
up_write(&snapshots_lock);
uuid_copy(id, &snapshot->id);
Unfortunately, the fix is not included in release 6.3.1.
Thank you for the quick repro + patch! I can confirm that the patch above resolves the issue for me too.
fix in release 6.3.2 Close.