bees
bees copied to clipboard
bees misses some duplicate extents
bees does not deduplicate all data and this is reproducible.
How to reproduce:
- Create empty 10GB fs
- create random 1GB file
- copy file to another one
- run bees (and wait for deduplication complete) only 3/4 of the copy is deduplicated (256MB left)
5.copy file once again 6.run bees (and wait for deduplication complete) 128MB of each files are not deduplicated
7.remove beeshash.dat and restart bees suddenly all data is deduplicated
detailed log
[root@vice:/mnt]$ uname -a
Linux vice 5.10.0-9-amd64 #1 SMP Debian 5.10.70-1 (2021-09-30) x86_64 GNU/Linux
[root@vice:/mnt]# lvcreate -L10G -n test /dev/mirror
Logical volume "test" created.
[root@vice:/mnt]# mkfs.btrfs /dev/mirror/test
btrfs-progs v5.10.1
See http://btrfs.wiki.kernel.org for more information.
Detected a SSD, turning off metadata duplication. Mkfs with -m dup if you want to force metadata duplication.
Label: (null)
UUID: a67c0bc7-ba45-41b7-b607-23d9577fc4fd
Node size: 16384
Sector size: 4096
Filesystem size: 10.00GiB
Block group profiles:
Data: single 8.00MiB
Metadata: single 8.00MiB
System: single 4.00MiB
SSD detected: yes
Incompat features: extref, skinny-metadata
Runtime features:
Checksum: crc32c
Number of devices: 1
Devices:
ID SIZE PATH
1 10.00GiB /dev/mirror/test
[root@vice:/mnt]# mount /dev/mirror/test /mnt/test
[root@vice:/home/gq]# dd if=/dev/random of=/mnt/test/file1 bs=1M count=1024
1073741824 bytes (1,1 GB, 1,0 GiB) copied, 3,49458 s, 307 MB/s
[root@vice:/mnt]# sync
[root@vice:/mnt]# btrfs filesystem du /mnt/test
Total Exclusive Set shared Filename
1.00GiB 1.00GiB - /mnt/test/file1
1.00GiB 1.00GiB 0.00B /mnt/test
[root@vice:/mnt]# cp /mnt/test/file1 /mnt/test/file2
[root@vice:/mnt]# sync
[root@vice:/mnt]# btrfs filesystem du /mnt/test
Total Exclusive Set shared Filename
1.00GiB 1.00GiB - /mnt/test/file1
1.00GiB 1.00GiB - /mnt/test/file2
2.00GiB 2.00GiB 0.00B /mnt/test
[root@vice:/mnt]# export BEESHOME=/tmp/bees
[root@vice:/mnt]# mkdir $BEESHOME
[root@vice:/mnt]# truncate -s 100M ${BEESHOME}/beeshash.dat
[root@vice:/mnt]# /usr/sbin/bees /mnt/test
https://gist.github.com/gerasiov/8894de358b0e42ba8ecee481336851f9
[root@vice:/mnt]# btrfs filesystem du /mnt/test
Total Exclusive Set shared Filename
1.00GiB 256.00MiB - /mnt/test/file1
1.00GiB 256.00MiB - /mnt/test/file2
2.00GiB 512.00MiB 768.00MiB /mnt/test
[root@vice:/mnt]# cp /mnt/test/file2 /mnt/test/file3
[root@vice:/mnt]# sync
[root@vice:/mnt]# btrfs filesystem du /mnt/test
Total Exclusive Set shared Filename
1.00GiB 256.00MiB - /mnt/test/file1
1.00GiB 256.00MiB - /mnt/test/file2
1.00GiB 1.00GiB - /mnt/test/file3
3.00GiB 1.50GiB 768.00MiB /mnt/test
[root@vice:/mnt]# /usr/sbin/bees /mnt/test
https://gist.github.com/gerasiov/b4d2541fe4c695c0f61d06b14894b060
[root@vice:/mnt]# btrfs filesystem du /mnt/test
Total Exclusive Set shared Filename
1.00GiB 128.00MiB - /mnt/test/file1
1.00GiB 128.00MiB - /mnt/test/file2
1.00GiB 128.00MiB - /mnt/test/file3
3.00GiB 384.00MiB 896.00MiB /mnt/test
[root@vice:/mnt]# rm -rf ${BEESHOME}/* && truncate -s 100M ${BEESHOME}/beeshash.dat
zsh: sure you want to delete all 3 files in /tmp/bees [yn]? y
[root@vice:/mnt]# /usr/sbin/bees /mnt/test
https://gist.github.com/gerasiov/48dd93e93214dc395cf3501df541498e
[root@vice:/mnt]# btrfs filesystem du /mnt/test
Total Exclusive Set shared Filename
1.00GiB 0.00B - /mnt/test/file1
1.00GiB 0.00B - /mnt/test/file2
1.00GiB 0.00B - /mnt/test/file3
3.00GiB 0.00B 1.00GiB /mnt/test
/usr/sbin/bees is the binary in my example (I packages bees a little bit different in my env)/
The test case is not very representative. Most filesystems do not consist entirely of pairs of duplicate files that are otherwise unique, where each file contains fewer extents than there are cores in the CPU. Even filesystems that are structured that way still have a >95% dedupe hit rate: I ran this test 100 times and failed to dedupe 10 out of 800 total extents (8 extents per run), for a success rate of 98.75%. Admittedly, in this corner case, the reasons for not getting all the way to 100% are bad, but any success rate over 95% is still OK for the current bees design.
A tiny data set like this one will hit several known issues in the current design:
- Hash insertion and lookup are not atomic wrt an extent. If two threads scan identical copies of novel unique data at exactly the same time, they will not detect matches in each others' data. This seems to be what happened in the first extent (offset 0 in
file1andfile2). Since each extent ref is scanned only once, the duplicated data is not detected untilfile3comes along, orbeescrawl.datis reset. In those cases bees would repeat the scan and find the hashes inserted by the previous run. In a more typical filesystem this event is extremely rare. - The
ExtentWalkerclass will throw an exception if a reverse extent search files due to an inconsistent view of metadata. If dedupe changes a neighbouring extent while the search is running,ExtentWalkerwill throw an exception, and processing of the extent will be abandoned. This happened in one of my test runs but not yours. - A related issue is dramatically reduced performance in btrfs if dedupe and
LOGICAL_INOoperate on the same parts of the metadata tree (it results in a large number of loops in tree mod log). The solution in both cases is to provide separation in time between identification of identical extents and the dedupe commands to remove them. Other dedupers achieve this easily by having completely separate scan and dedupe phases, at the cost of much more RAM usage. - In your second log, two threads found the same duplicate data in the same extent at the same time; however, the two source extent references were different lengths (due to a kernel issue in dedupe which is mostly harmless except for this effect), so the threads did not exclude each other from the same destination. This was caught just before the dedupe and an exception prevented processing of the second extent. I'm not sure that any dedupe failed to occur here (it would seem that bees merely prevented deduping an extent that was already deduped with itself).
- Matching on 128M extents takes multiple seconds, exacerbating item 1.
Although these problems are individually fixable, the current crawler design has many more problems not listed above, and significant rework is required to gain even tiny improvements over the current implementation. Instead, the existing crawler code will be replaced by a new design which avoids the above issues from the start.
Instead, the existing crawler code will be replaced by a new design which avoids the above issues from the start.
Very interesting!
Is there an issue to sub/way to get notified when you start work on the new design and try it out?