btrfs
btrfs copied to clipboard
Support for deduplication
Hello!
From reading the list of features I didn't find the deduplication feature. I'm seeking for a cross-platform filesystem with an online (or periodic) deduplication feature, and btrfs seems to be one of the candidates.
Does WinBtrfs support deduplication in some way or another?
It's something I'm interested in, but I still need to decide how we'd go about it. The essential problem of deduplication is that you need horrendous amounts of RAM, for potentially very little gain.
For now I'm using the Windows dedup feature (the one from Windows Server), it works as a "periodic dedup". It have options for setting up RAM usage, I set it on 25% (I have 20 GB of RAM) and it works wonders (I have a 2 TB storage drive, which because of dedup contains data of about 3.2 TB and have about 250 GB free) - one drawback it is not cross-platform. 25% is about 5 GB which seems fine to me, but because it is periodic, it may be different for online dedup, as it can always be delayed in case of memory needs in the periodic case.
It would be great if something like this would be supported: "offline" / "out-of-band" deduplication https://btrfs.wiki.kernel.org/index.php/Deduplication
Yes, "online" or "real-time" (Inband) deduplication is unnecessary and would eat up too much ressources all the time.
I also currently use the Windows dedup feature on my 2tb ssd and could save about 353GB. It could be much more with btrfs zstd + dedup.
Enabled UsageType SavedSpace SavingsRate Volume
------- --------- ---------- ----------- ------
True Default 353 GB 20 % D:
It's important to note that the deduplication still should be possible on a mounted volume. Because unmounting each time would be not so useful.
There is an offline dedupe tool called duperemove, it hashes all the files
on a mounted btrfs into a specified block size, and writes them to a sqlite file (which can be saved) finds dupes and creates links the inodes and clearing the now free space. Subsequent runs can use the saved sqlite file so it doesnt have rehash everything.
I think this should work with any compliant btrfs filesystem.
Wondering if anyone has tried this on this implementation as it would suit my needs fine.
I think the comment above by @Evengard is misunderstanding offline dedupe - it just means it can be a scheduled or manual job, instead of 'real-time' or inline deduplication which keeps the block hashes in memory using ram and cpu and dedupes blocks while writing.
From the moment of the creation of this ticket, my understanding of the deduplication feature indeed increased quite a bit.
I guess while the way I phrased this ticket - it is mostly out of scope of that project, there still might be a need of some support in the driver side for supporting the said jobs (such as the FIDEDUPERANGE ioctl).
I guess that's also the answer for the project author about the way to tackle this issue - the author DOES NOT need to implement the deduplication feature fully (at least no in the scope of this exact project), instead just making the needed APIs available for third party software (this is exactly how it is done for btrfs on Linux with the FIDEDUPERANGE ioctl, and probably the same way NTFS dedup feature also works).
IIRC there's Windows equivalents of the necessary ioctls, which are supported here. It just needs some userspace support.
I guess then some kind of documentation would be highly appreciated, because from browsing the source code I can't find anything related to the FIDEDUPERANGE ioctl or similar stuff.