blksnap icon indicating copy to clipboard operation
blksnap copied to clipboard

Creating concurrent snapshots / trackers of multiple block devices terminates the first one uncleanly

Open mappu opened this issue 10 months ago • 6 comments

Distribution

Debian 12

Architecture

amd64

Kernel version

Linux 6.1.0-32-amd #1 SMP PREEMPT_DYNAMIC Debian 6.1.129-1 (2025-03-06) x86_64 GNU/Linux

Blksnap version

VAL-6.1 branch (6.1.2.1781) and also VAL-6.3 branch (6.3.0.73)

Bug description

When creating concurrent snapshots/trackers of different block devices, creating the second one, causes the first one to be cleaned up in a kind of unclean way.

Steps to reproduce

  1. Ordinary GPT disk with EXT4 partitions- sda (sda1, sda2, sda3)
  2. Create tracker1 + snapshot1 for first block device
    • ID=$(blksnap snapshot_create --device /dev/sda1 | grep ...)
    • blksnap stretch_snapshot --id $ID --path /.some-temp-path-snapshot1 --limit 1024 &
    • blksnap snapshot_take --id $ID
    • blksnap snapshot_collect --id $ID # find maj:min device for the snapshot
  3. Create tracker2 + snapshot2 for second block device
    • Same 4 commands but for e.g. /dev/sda2
  4. Free snapshot2 + tracker2 for second block device
    • blksnap snapshot_destroy $ID
    • kill the stretch_snapshot $PID...
    • wait for the stretch_snapshot $PID...
    • blksnap tracker_remove --device /dev/sda2
    • rm /.some-temp-path-snapshot2* || true
  5. Free snapshot1 + tracker1 for first block device
    • Same steps

Expected behavior

The snapshot/tracker for both devices should be safely removed.

But what happens instead is: After creating the 2nd device's snapshot/tracker,

  • the first snapshot disappears
  • the first stretch_snapshot PID exits
  • the snapshot_collect output includes only the snapshot from the 2nd device, not the first
  • The first tracker cannot be removed. It is stuck in some device busy error.
  • The first block device snapshotted cannot be snapshotted again.

There are these errors in dmesg:

Failed to copy data to diff storage with error 4
Unable to destory snapshot: cannot find snapshot by id ...
Removing device [8:1] from tracking
Tracker for device [8:1] is busy with a snapshot

Additional information

I'm happy to provide any other wanted information,

mappu avatar Mar 18 '25 02:03 mappu

When the first snapshot is "damaged" in this way (some kind of unclean shutdown), an rmmod blksnap also hangs indefinitely.

mappu avatar Mar 18 '25 03:03 mappu

Hi.

Did I understand you correctly that device /dev/sda1 belongs to the first snapshot, and device /dev/sda2 belongs to the second? But when you delete the second snapshot, do you remove the /dev/sda1 device from tracking? I think it should be /dev/sda2.

In any case, I'll try to do a test for both scenarios. Even wrong actions should not lead to hangups.

Thanks for the feedback.

SergeiShtepa avatar Mar 18 '25 10:03 SergeiShtepa

Did I understand you correctly that device /dev/sda1 belongs to the first snapshot, and device /dev/sda2 belongs to the second? But when you delete the second snapshot, do you remove the /dev/sda1 device from tracking? I think it should be /dev/sda2.

Sorry, of course it's /dev/sda2. I updated the post to be more clear.

mappu avatar Mar 19 '25 01:03 mappu

I confirm this issue. Test Two snapshots. This is not a problem for the VAL project, as it does not allow to create more than one snapshot. I plan to fix the problem in one of the next releases.

SergeiShtepa avatar Mar 19 '25 15:03 SergeiShtepa

I found problem.

diff --git a/module/snapshot.c b/module/snapshot.c
index 2b59fcd..d6aaef4 100644
--- a/module/snapshot.c
+++ b/module/snapshot.c
@@ -413,7 +413,7 @@ int snapshot_create(struct blk_snap_dev_t *dev_id_array, unsigned int count,
 	}
 
 	down_write(&snapshots_lock);
-	list_add_tail(&snapshots, &snapshot->link);
+	list_add_tail(&snapshot->link, &snapshots);
 	up_write(&snapshots_lock);
 
 	uuid_copy(id, &snapshot->id);

Unfortunately, the fix is not included in release 6.3.1.

SergeiShtepa avatar Mar 19 '25 17:03 SergeiShtepa

Thank you for the quick repro + patch! I can confirm that the patch above resolves the issue for me too.

mappu avatar Mar 19 '25 23:03 mappu

fix in release 6.3.2 Close.

SergeiShtepa avatar Jun 19 '25 12:06 SergeiShtepa