btrfs icon indicating copy to clipboard operation
btrfs copied to clipboard

Support for deduplication

Open Evengard opened this issue 5 years ago • 10 comments

Hello!

From reading the list of features I didn't find the deduplication feature. I'm seeking for a cross-platform filesystem with an online (or periodic) deduplication feature, and btrfs seems to be one of the candidates.

Does WinBtrfs support deduplication in some way or another?

Evengard avatar May 15 '20 10:05 Evengard

It's something I'm interested in, but I still need to decide how we'd go about it. The essential problem of deduplication is that you need horrendous amounts of RAM, for potentially very little gain.

maharmstone avatar May 17 '20 12:05 maharmstone

For now I'm using the Windows dedup feature (the one from Windows Server), it works as a "periodic dedup". It have options for setting up RAM usage, I set it on 25% (I have 20 GB of RAM) and it works wonders (I have a 2 TB storage drive, which because of dedup contains data of about 3.2 TB and have about 250 GB free) - one drawback it is not cross-platform. 25% is about 5 GB which seems fine to me, but because it is periodic, it may be different for online dedup, as it can always be delayed in case of memory needs in the periodic case.

Evengard avatar May 17 '20 13:05 Evengard

It would be great if something like this would be supported: "offline" / "out-of-band" deduplication https://btrfs.wiki.kernel.org/index.php/Deduplication

Yes, "online" or "real-time" (Inband) deduplication is unnecessary and would eat up too much ressources all the time.

I also currently use the Windows dedup feature on my 2tb ssd and could save about 353GB. It could be much more with btrfs zstd + dedup.

Enabled            UsageType          SavedSpace           SavingsRate          Volume
-------            ---------          ----------           -----------          ------
True               Default            353 GB               20 %                 D:

theChaosCoder avatar Jun 05 '20 08:06 theChaosCoder

It's important to note that the deduplication still should be possible on a mounted volume. Because unmounting each time would be not so useful.

Evengard avatar Jun 08 '20 11:06 Evengard

There is an offline dedupe tool called duperemove, it hashes all the files on a mounted btrfs into a specified block size, and writes them to a sqlite file (which can be saved) finds dupes and creates links the inodes and clearing the now free space. Subsequent runs can use the saved sqlite file so it doesnt have rehash everything.

I think this should work with any compliant btrfs filesystem.

Wondering if anyone has tried this on this implementation as it would suit my needs fine.

I think the comment above by @Evengard is misunderstanding offline dedupe - it just means it can be a scheduled or manual job, instead of 'real-time' or inline deduplication which keeps the block hashes in memory using ram and cpu and dedupes blocks while writing.

sedlund avatar May 17 '23 01:05 sedlund

From the moment of the creation of this ticket, my understanding of the deduplication feature indeed increased quite a bit.

I guess while the way I phrased this ticket - it is mostly out of scope of that project, there still might be a need of some support in the driver side for supporting the said jobs (such as the FIDEDUPERANGE ioctl).

I guess that's also the answer for the project author about the way to tackle this issue - the author DOES NOT need to implement the deduplication feature fully (at least no in the scope of this exact project), instead just making the needed APIs available for third party software (this is exactly how it is done for btrfs on Linux with the FIDEDUPERANGE ioctl, and probably the same way NTFS dedup feature also works).

Evengard avatar May 17 '23 02:05 Evengard

IIRC there's Windows equivalents of the necessary ioctls, which are supported here. It just needs some userspace support.

maharmstone avatar May 17 '23 02:05 maharmstone

I guess then some kind of documentation would be highly appreciated, because from browsing the source code I can't find anything related to the FIDEDUPERANGE ioctl or similar stuff.

Evengard avatar May 17 '23 02:05 Evengard