fclones icon indicating copy to clipboard operation
fclones copied to clipboard

Limit deduplication to files that have not been deduplicated yet

Open patrickwolf opened this issue 1 year ago • 2 comments

This is a feature request for the fclones dedupe feature.

Currently on each run it creates new reflinks for files even if they have been already deduplicated. This also means that the storage estimates around how much space is wasted are off.

Seems like there are at least two solutions:

  1. Write to the cache if a file has been deduplicated and not attempt it again (this could also fix the storage estimate)
  2. Check the extends of each file to verify if they have been already deduplicated and only attempt it again if they aren't fully deduplicated yet

For solution 2) here are some ways that could work

root@ubuntu1:/ex2/_Data# filefrag -v fclones.json fclones2.json
Filesystem type is: 9123683e
File size of fclones.json is 458923952 (112042 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..   65535: 32208319704..32208385239:  65536:             shared
   1:    65536..  112041: 32208385681..32208432186:  46506: 32208385240: last,shared,eof
fclones.json: 2 extents found
File size of fclones2.json is 458923952 (112042 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..   65535: 32208319704..32208385239:  65536:             shared
   1:    65536..  112041: 32208385681..32208432186:  46506: 32208385240: last,shared,eof
fclones2.json: 2 extents found
root@ubuntu1:/ex2/_Data#

Ref:

The cache might be easier to start with and checking the extends cooler :) and more future proof

Thanks for considering it

patrickwolf avatar Apr 10 '23 22:04 patrickwolf

Using the (existing) cache is the practical approach since it would not require adding more low-level linux syscalls.

th1000s avatar Apr 18 '23 22:04 th1000s

@pkolaczk what do you think of adding deduplication information to the cache?

patrickwolf avatar Apr 25 '23 15:04 patrickwolf