bees icon indicating copy to clipboard operation
bees copied to clipboard

btrfs send fails after non-simultaneous use of bees

Open automorphism88 opened this issue 5 years ago • 20 comments

After running bees on a filesystem containing a parent snapshot, and then trying to do an incremental send from that snapshot, after bees has been run in between, but not simultaneously (bees fully stopped before starting the send), the send fails with the following in dmesg:

[57807.246706] BTRFS error (device dm-2): Send: inconsistent snapshot, found updated extent for inode 8924138 without updated inode item, send root is 4457, parent root is 4425

This is with kernel 5.1.8 and bees 0.6.1.

automorphism88 avatar Jun 13 '19 02:06 automorphism88

OK, so send had two bugs after all. Are you willing to report this to the linux-btrfs mailing list?

Zygo avatar Jun 13 '19 02:06 Zygo

OK, so send had two bugs after all. Are you willing to report this to the linux-btrfs mailing list?

I've never posted to the mailing list before but I'm sure I can figure out how. What about the kernel bugzilla? I've posted there before.

automorphism88 avatar Jun 13 '19 03:06 automorphism88

Kernel bugzilla is worth a shot. Best case, it's a simple fix. Worst case, it'll get ignored, and I'll forward it to the mailing list when I get a few spare cycles (possibly after a bit more analysis).

Zygo avatar Jun 13 '19 23:06 Zygo

https://bugzilla.kernel.org/show_bug.cgi?id=203933

automorphism88 avatar Jun 19 '19 17:06 automorphism88

Does this happen with the other duperemove tool too, or just bees?

ghost avatar Jun 28 '19 12:06 ghost

Does this happen with the other duperemove tool too, or just bees?

It's a kernel bug, so it should affect all deduplicators on btrfs.

Zygo avatar Jun 29 '19 01:06 Zygo

So I tried the following:

$ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt

$ xfs_io -f -c "pwrite -S 0xab 0 1M" /mnt/foo

$ btrfs subvolume snapshot -r /mnt /mnt/snap1

$ xfs_io -f -c "pwrite -S 0xab 0 1M" /mnt/bar

$ btrfs subvolume snapshot -r /mnt /mnt/snap2

# deduplicate foo into bar, so that both point to the same extent(s) $ xfs_io -c "dedupe /mnt/snap2/foo 0 0 1M" /mnt/snap2/bar

# do the incremental send, see if it fails $ btrfs send -p /mnt/snap1 -f /dev/null /mnt/snap2 $ echo $? 0

dmesg/syslog is also clean. Applying send streams to a filesystem also shows both files are there and with correct content.

Can you provide more details on how the deduplication is being done exactly? Full, just same extents, order, etc. Also, any special mount options?

Thanks.

fdmanana avatar Jul 02 '19 11:07 fdmanana

I'm not sure how xfs_io works. But bees does not dedupe extents directly, it instead rewrites new extents based on the hash table into a temporary file, then dedupes all discovered extents to this new temporary file (which is then deleted again). This may be quite different from what xfs_io does and could probably very well affect what btrfs-send sees as the result.

kakra avatar Jul 02 '19 11:07 kakra

Where is this temporary file created? It does seem plausible that this could be the cause for the send error. Or rather, is send correct that the ro snapshots did in fact change?

ghost avatar Jul 02 '19 12:07 ghost

@Gatak More or less "nowhere"... I think bees creates it in the root subvolume, acquires an open file descriptor of it, then immediately deletes it, only then it's writing file data to the FD. So it's writing to an anonymous, btrfs-backed file. If your reasoning behind the question is if the file is created in the RO snapshot: No, it isn't.

@Zygo may know more but probably deduping extents from the RO snapshot to this newly created file removes the original extents and thus "modifies" the snapshot (not the file contents but the extent structure). But I think exactly that should've been fixed in bees already by ignoring RO snapshots.

kakra avatar Jul 02 '19 12:07 kakra

But I think exactly that should've been fixed in bees already by ignoring RO snapshots.

Then space/duplicates taken by RO snapshots is also not considered or reduced, which was the point to start with, wasn't it?

ghost avatar Jul 02 '19 20:07 ghost

@Gatak, @kakra: Temporary files are created in the root subvol with O_TMPFILE. They never have names, they are created by the kernel with a zero link count. The root subvol is used for temporaries because it's a part of the filesystem that necessarily exists and is writable, and dedupe doesn't care where the temporary extents live. If bees can dedupe a dst extent out of existence using extents that already exist, bees just does that. If no suitable extent is found (e.g. existing extents contain a mix of duplicate and unique data), bees creates a new extent with the right size and content to use as a src extent.

To the snapshots, the bees temporary files are just files in another subvolume with a higher transid. They should have the same effect as the second pwrite.

Zygo avatar Jul 03 '19 00:07 Zygo

@fdmanana That script looks right, in the sense that bees does something similar, but I haven't reproduced this myself, and I don't think it's quite that simple. We probably need to set up a larger, more realistic test (e.g. copy /usr into a subvol instead of just one extent), run bees and send until it fails, then try to figure out what happened to the filesystem when the error is detected.

Zygo avatar Jul 03 '19 00:07 Zygo

So I managed to find out how it happens exactly, it's not that trivial to reproduce and happens sort of randomly, no wonder why I have not ever hit it or had other user reports before. I'll send a fix soon (this week) to the btrfs mailing list.

No need to use bees for triggering this. Thanks.

fdmanana avatar Jul 16 '19 15:07 fdmanana

Sent:

https://lore.kernel.org/linux-btrfs/[email protected]/T/#u

https://lore.kernel.org/linux-btrfs/[email protected]/T/#u

fdmanana avatar Jul 17 '19 12:07 fdmanana

This issue started happening back in 2015 when deduplication was updated to not update the inode's ctime and mtime and update only the iversion.

That sounds familiar: https://www.spinics.net/lists/linux-btrfs/msg45113.html Oops, my bad? ;)

Zygo avatar Jul 17 '19 18:07 Zygo

Kernel fix queued for 5.3 and will appear in the stable trees.

kdave avatar Jul 26 '19 14:07 kdave

@kdave could you let us know when it should have appeared in stable trees?

I'm waiting on this before trying bees to ensure btrbk backups continue to function.

HaleTom avatar Aug 12 '19 18:08 HaleTom

According to

for x in $(git log --date-order --all --grep=b4f9a1a8 --format=%h); do echo "$x: $(git tag --contains "$x")"; done

it is in 5.2.7, 4.19.65, 4.14.137, and 4.9.188.

Zygo avatar Aug 12 '19 19:08 Zygo

@HaleTom - I'm just diving into btrbk and bees. I'm seeing this issue is still open, so I'm wondering if it worked for you?

DiagonalArg avatar May 15 '21 11:05 DiagonalArg