dduper icon indicating copy to clipboard operation
dduper copied to clipboard

Can't find or match chunks on subvolume which uses blake2 csum

Open broetchenrackete36 opened this issue 5 years ago • 10 comments

Running dduper on a subvolume doesn't seem to work. Both directories have the same two files. Both files are canceled dd copies of my boot drive.

Output from subvolume:

[bluemond@BlueQ dduper]$ sudo python2 ./dduper --device /dev/sda1 --dir /btrfs/subvol/ddtest/ --dry-run
Prefect match :  /btrfs/subvol/ddtest/sbd.img /btrfs/subvol/ddtest/sbd.img2
Summary
blk_size : 4KB  chunksize : 8192KB
/btrfs/subvol/ddtest/sbd.img has 0 chunks
/btrfs/subvol/ddtest/sbd.img2 has 0 chunks
Matched chunks: 0
Unmatched chunks: 0
Total size(KB) available for dedupe: 0
dduper took 32.3749928474 seconds
[bluemond@BlueQ dduper]$ sudo python2 ./dduper --device /dev/sda1 --dir /btrfs/subvol/ddtest/
Prefect match :  /btrfs/subvol/ddtest/sbd.img /btrfs/subvol/ddtest/sbd.img2
************************
Dedupe completed for /btrfs/subvol/ddtest/sbd.img:/btrfs/subvol/ddtest/sbd.img2
Summary
blk_size : 4KB  chunksize : 8192KB
/btrfs/subvol/ddtest/sbd.img has 0 chunks
/btrfs/subvol/ddtest/sbd.img2 has 0 chunks
Matched chunks: 0
Unmatched chunks: 0
Total size(KB) deduped: 0
dduper took 32.7617127895 seconds

Output from rootvolume:

[bluemond@BlueQ dduper]$ sudo python2 ./dduper --device /dev/sda1 --dir /btrfs/ddtest/ --dry-run
Summary
blk_size : 4KB  chunksize : 32KB
/btrfs/ddtest/sbd.img has 184064 chunks
/btrfs/ddtest/sbd.img2 has 84480 chunks
Matched chunks: 32066
Unmatched chunks: 52414
Total size(KB) available for dedupe: 1026112
dduper took 36.9195628166 seconds
[bluemond@BlueQ dduper]$ sudo python2 ./dduper --device /dev/sda1 --dir /btrfs/ddtest/
************************
Dedupe completed for /btrfs/ddtest/sbd.img:/btrfs/ddtest/sbd.img2
Summary
blk_size : 4KB  chunksize : 32KB
/btrfs/ddtest/sbd.img has 184064 chunks
/btrfs/ddtest/sbd.img2 has 84480 chunks
Matched chunks: 32066
Unmatched chunks: 52414
Total size(KB) deduped: 0
dduper took 204.889986038 seconds

Also I'm not sure why the total size deduped is 0 on the actual dedupe...

I am using blake2 as csum on a 6-drive raid5 data raid1 meta array.

broetchenrackete36 avatar Jul 28 '20 01:07 broetchenrackete36

@broetchenrackete36 thanks. Could you please try below steps and tell the results?

Lets first check whether dump-csum option working properly. If this fails then dduper won't work.

btrfs inspect-internal dump-csum /btrfs/subvol/ddtest/sbd.img /dev/sda1  &> /tmp/subvol_csum1
btrfs inspect-internal dump-csum /btrfs/subvol/ddtest/sbd.img2 /dev/sda1 &> /tmp/subvol_csum2

btrfs inspect-internal dump-csum /btrfs/ddtest/sbd.img  /dev/sda1 &> /tmp/root_csum1
btrfs inspect-internal dump-csum /btrfs/ddtest/sbd.img2  /dev/sda1 &> /tmp/root_csum1

Please confirm output files are non-empty and check its md5sum are same.

md5sum /tmp/subvol_csum{1,2}
md5sum /tmp/root_csum{1,2}

If this worked, then there is issue with the python script which should be easier to solve. If we have failure on dump-csum then I need to re-create your set-up and examine what’s going on.

Lakshmipathi avatar Jul 28 '20 04:07 Lakshmipathi

Also I'm not sure why the total size deduped is 0 on the actual dedupe...

Before you try above steps https://github.com/Lakshmipathi/dduper/issues/8#issuecomment-664772029 , can you get the latest dduper file and check again on your environment?. Its a one-line fix for total size deduped is 0. Actually dduper removed duplicate data but prints out wrong info, now it should report correct values.

diff --git a/dduper b/dduper
index 20dbde7..8bde512 100755
--- a/dduper
+++ b/dduper
@@ -276,6 +276,7 @@ def display_summary(blk_size, chunk_sz, perfect_match_chunk_sz, src_file,
     global dst_file_sz
     if perfect_match == 1:
         chunk = perfect_match_chunk_sz
+        total_bytes_deduped = dst_file_sz
     else:
         chunk = chunk_sz

Lakshmipathi avatar Jul 28 '20 06:07 Lakshmipathi

Thanks for the response. I applied the fix but I still get 0 for total deduped size.

I also ran the dump-csum on the files in the subvolume and root volume. It produces nothing (empty file) on the subvolume and works fine on the root volume...

broetchenrackete36 avatar Jul 28 '20 16:07 broetchenrackete36

Thanks for the response. I applied the fix but I still get 0 for total deduped size.

That's strange. If you run sudo python2 ./dduper --device /dev/sda1 --dir /btrfs/ddtest/ and check disk usage with sync && df does it show any new free space or it remains the same?

It produces nothing (empty file) on the subvolume and works fine on the root volume

I haven't really tested the tool with subvolume. but I think it should work with root volume since it reports csum from it.

I am using blake2 as csum on a 6-drive raid5 data raid1 meta array.

How easy or hard to re-create your setup, can you share sample RAID commands or script ? I can launch cloud vm with required devices and check.

Lakshmipathi avatar Jul 28 '20 17:07 Lakshmipathi

I created the array like this:

sudo mkfs.btrfs -d raid5 -m raid1 -L BlueButter -f /dev/sda1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1 --csum blake2

And then mounted like this:

sudo mount -t btrfs -o clear_cache,space_cache=v2,noatime /dev/sda1 /btrfs/

And then simply created a new subvolume:

sudo btrfs subv create /btrfs/subvol

I checked if dduper is freeing space and it doesn't seem so when looking at df output. I even cp'd one of the file to have two exact same files and df didn't show a difference in available space... This could be related to raid5 though, df with raid5 is not really reliable...

broetchenrackete36 avatar Jul 28 '20 23:07 broetchenrackete36

thanks for the details. Let me check whether dduper can support raid setup.

Lakshmipathi avatar Jul 29 '20 04:07 Lakshmipathi

update: I tried above setup it gave me different errors:

bad tree block 22036480, bytenr mismatch, want=22036480, have=0
ERROR: cannot read chunk root
unable to open /dev/sda
bad tree block 22036480, bytenr mismatch, want=22036480, have=0
ERROR: cannot read chunk root
unable to open /dev/sda
Perfect match :  /mnt/f1 /mnt/f2
Summary
blk_size : 4KB  chunksize : 8192KB
/mnt/f1 has 1 chunks
/mnt/f2 has 1 chunks
Matched chunks: 1
Unmatched chunks: 0
Total size(KB) available for dedupe: 8192 
dduper took 1.42327594757 seconds

If I'm not wrong, I was able to reproduce the issue with below command and suspect it may be related --csum blake2 . Below command worked with default crc32.

mkfs.btrfs -m raid1 /dev/sda /dev/sdb -f --csum blake2

Need to examine further.

Lakshmipathi avatar Jul 30 '20 14:07 Lakshmipathi

The issue is related to blake2 csum. I don't know exactly why blake2 csum fetched for files with same content differs. Here is a simple way to reproduce the issue:

mkfs.btrfs /dev/sda --csum blake2 now mount and run cp /tmp/a /mnt/f{1,2} btrfs inspect-internal dump-csum /mnt/f1 /dev/sda &> /tmp/f1.csum btrfs inspect-internal dump-csum /mnt/f2 /dev/sda &> /tmp/f2.csum

While using default crc32, contents of /tmp/f1.csum and /tmp/f2.csum will match. But in this case, csum file differ. I plan to explore this blake2 soon, until then I'll add limitation that dduper won't support --csum blake2

Lakshmipathi avatar Jul 31 '20 17:07 Lakshmipathi

I added fix for new checksum types like xxhash64,blake2,sha256 https://github.com/Lakshmipathi/dduper/pull/42 . And tested locally. If you installed dduper via source, you can try git pull and try it.

I need to fix issues related to sub-volume.

Lakshmipathi avatar Sep 18 '20 05:09 Lakshmipathi

Release version dduper v0.04 with new checksum support. It should available via all installation methods.

Lakshmipathi avatar Sep 18 '20 06:09 Lakshmipathi