bcachefs
bcachefs copied to clipboard
Restoring a snapshot causes snapshot delete to block until drop_caches is called
Details
Setup
- Using Kernel: 6.11 / bcachefs-testing / fcd6549aad02f9abeb9184201c285e6b67b3f098.
- bcachefs is mounted at
/mnt/bcachefs
- /opt is a bind mount to
/mnt/bcachefs/opt
- daily snapshots of
opt
are taken in/mnt/bcachefs/snapshots/opt
Rollback operation:
- During an update using jetbrain-toolbox to update all applications, my computer crashed and I wanted to start from a clean state.
- confirm with
fuser -vm /opt
no application using /opt, nothing is configured to use/mnt/bcachefs/opt
directly. -
umount /opt
-
cd /mnt/bcachefs
-
mv opt opt.dead
-
bcachefs subvolume snapshot snapshots/opt/@GMT-2024.08.26-05.00.18/ opt
-
ls opt
all files are present. -
mount --bind /mnt/bcachefs/opt /opt
- Restart services.
-
bcachefs subvolume del ./opt.dead
Problem
- Everything starts as expected, I/O are happening on all devices.
- I restart updates from the jetbrain-toolbox.
- Repeating stack traces appears in dmesg: stack1.txt
- umounting the filesystem fixes the issue and ends the deletion with:
bch2_delete_dead_snapshots: error deleting keys from dying snapshots erofs_trans_commit
bch2_delete_dead_snapshots: error erofs_trans_commit
shutdown complete, journal seq 34783919.
Reproducing the problem with more debug information.
- Repeat rollback and step 1, 2 previously.
- New debug message appears repeatedly:
bch2_evict_subvolume_inodes() waited 10 seconds for inode 671283974:6768 to go away: ref 1 state 65536
- echo w shows the same stack for the blocked delete: stack2.txt
- after waiting about half an hour, issue
echo 3 > /proc/sys/vm/drop_caches
- repeating log stops and multiple gigabytes of discard operation start on both NVMe.