blksnap Creating concurrent snapshots / trackers of multiple block devices terminates the first one uncleanly

Distribution

Debian 12

Architecture

amd64

Kernel version

Linux 6.1.0-32-amd #1 SMP PREEMPT_DYNAMIC Debian 6.1.129-1 (2025-03-06) x86_64 GNU/Linux

Blksnap version

VAL-6.1 branch (6.1.2.1781) and also VAL-6.3 branch (6.3.0.73)

Bug description

When creating concurrent snapshots/trackers of different block devices, creating the second one, causes the first one to be cleaned up in a kind of unclean way.

Steps to reproduce

Ordinary GPT disk with EXT4 partitions- sda (sda1, sda2, sda3)
Create tracker1 + snapshot1 for first block device
- ID=$(blksnap snapshot_create --device /dev/sda1 | grep ...)
- blksnap stretch_snapshot --id $ID --path /.some-temp-path-snapshot1 --limit 1024 &
- blksnap snapshot_take --id $ID
- blksnap snapshot_collect --id $ID # find maj:min device for the snapshot
Create tracker2 + snapshot2 for second block device
- Same 4 commands but for e.g. /dev/sda2
Free snapshot2 + tracker2 for second block device
- blksnap snapshot_destroy $ID
- kill the stretch_snapshot $PID...
- wait for the stretch_snapshot $PID...
- blksnap tracker_remove --device /dev/sda2
- rm /.some-temp-path-snapshot2* || true
Free snapshot1 + tracker1 for first block device
- Same steps

Expected behavior

The snapshot/tracker for both devices should be safely removed.

But what happens instead is: After creating the 2nd device's snapshot/tracker,

the first snapshot disappears
the first stretch_snapshot PID exits
the snapshot_collect output includes only the snapshot from the 2nd device, not the first
The first tracker cannot be removed. It is stuck in some device busy error.
The first block device snapshotted cannot be snapshotted again.

There are these errors in dmesg:

Failed to copy data to diff storage with error 4
Unable to destory snapshot: cannot find snapshot by id ...
Removing device [8:1] from tracking
Tracker for device [8:1] is busy with a snapshot

Additional information

I'm happy to provide any other wanted information,

Mar 18 '25 02:03 mappu

When the first snapshot is "damaged" in this way (some kind of unclean shutdown), an rmmod blksnap also hangs indefinitely.

Mar 18 '25 03:03 mappu

Hi.

Did I understand you correctly that device /dev/sda1 belongs to the first snapshot, and device /dev/sda2 belongs to the second? But when you delete the second snapshot, do you remove the /dev/sda1 device from tracking? I think it should be /dev/sda2.

In any case, I'll try to do a test for both scenarios. Even wrong actions should not lead to hangups.

Thanks for the feedback.

Mar 18 '25 10:03 SergeiShtepa

Did I understand you correctly that device /dev/sda1 belongs to the first snapshot, and device /dev/sda2 belongs to the second? But when you delete the second snapshot, do you remove the /dev/sda1 device from tracking? I think it should be /dev/sda2.

Sorry, of course it's /dev/sda2. I updated the post to be more clear.

Mar 19 '25 01:03 mappu

I confirm this issue. Test Two snapshots. This is not a problem for the VAL project, as it does not allow to create more than one snapshot. I plan to fix the problem in one of the next releases.

Mar 19 '25 15:03 SergeiShtepa

I found problem.

diff --git a/module/snapshot.c b/module/snapshot.c
index 2b59fcd..d6aaef4 100644
--- a/module/snapshot.c
+++ b/module/snapshot.c
@@ -413,7 +413,7 @@ int snapshot_create(struct blk_snap_dev_t *dev_id_array, unsigned int count,
 	}
 
 	down_write(&snapshots_lock);
-	list_add_tail(&snapshots, &snapshot->link);
+	list_add_tail(&snapshot->link, &snapshots);
 	up_write(&snapshots_lock);
 
 	uuid_copy(id, &snapshot->id);

Unfortunately, the fix is not included in release 6.3.1.

Mar 19 '25 17:03 SergeiShtepa

Thank you for the quick repro + patch! I can confirm that the patch above resolves the issue for me too.

Mar 19 '25 23:03 mappu

fix in release 6.3.2 Close.

Jun 19 '25 12:06 SergeiShtepa