squashfs-tools icon indicating copy to clipboard operation
squashfs-tools copied to clipboard

Block level deduplication

Open szycha76 opened this issue 5 years ago • 9 comments

File-level deduplication currently present in squashfs is great and surely saves a lots of space in various scenarios. Could block-level deduplication be added without breaking compatibility with current format?

mksquashfs would take significantly longer to complete (especially with small block sizes and big data sets), but could also save much more space with e.g. snapshots of system disk images and other big, mostly similar files.

If backward-compatible change is not possible, could format upgrade/change be investigated for that? Block level deduplication has several drawbacks, too. Internal fragmentation is one of them of course.

szycha76 avatar Feb 06 '19 12:02 szycha76

Files in squashfs are represented basically as a pointer to a start block, and a list of sizes of the blocks that follow. Because of this assumption that all blocks in a file are located one after the other, it will require some level of change to the format.

Dr-Emann avatar Feb 06 '19 15:02 Dr-Emann

In order to avoid impact on all files, a new file type could be introduced (assuming anyone would like to implement it) - say 'fragmented file', with list of extents making the file.

szycha76 avatar Feb 07 '19 16:02 szycha76

I am so hoping to see this feature implemented. I've started to use squashfs on my Mac mini to archive up stuff like sample CDs and emulator games, and it's so cool to be able to simply mount these highly-compressed images at will and share them via Samba.

Currently, I'm clearing up a crapton of old PSP game ISOs, and the duplication level here is off the charts. Not just considering the amount of data that is shared between most, if not all, of the images, such as firmware update files and boot loaders. There are also often many different versions of each game, either because of regional and localization differences, or because of "rips," where someone removed non-vital files for an image to make it smaller. These latter examples are 100% duplicates in terms of blocks in the image, but won't compress as such due to the nature and often size of the ISOs.

I was considering creating a dedicated ZFS pool with dedup enabled, setting the block size to 2048 bytes to match that of a CD ISO, but that's not really realistic. It would create an insanely large DDT due to the very small block size (not to mention the compression would suffer as well), which would require a ridiculous amount of memory, and I believe the DDT overhead is also stored on disk (as ZFS is a r/w fs, so it needs that dedup table readily available for writes), and the metadata for each block is not negligible, so it might counteract the benefits of deduplication to the point of actual space saving being negligible.

Having a read-only alternative to this, where you won't need massive amounts of memory to hold a dedup table, but instead rely on data pointers to just-in-time serve the right block when reading would be magical.

I'd then either create that squasfs image without compression and put it on a compressed fs, or squash it once more with a 1-MB block size using squashfs zstd:22 compression.

Man, that'd be sweet 💌

DanielSmedegaardBuus avatar Oct 12 '23 10:10 DanielSmedegaardBuus

Will get off topic regarding the issue, but sharing my experience which may help.

Had roughly the same idea earlier, looking for a modern (read-only) archival format because tar is really not doing well without at least a file index, and 7z apparently has design weaknesses making it unreliable for archiving. Checked out squashfs as it seemed to have quite relevant properties, and even though the file manager integration was lacking for handling archives conveniently like some other common formats, the FUSE mount option seemed to be quite okay for me.

The deal-breaker I ran into was the archive size which is likely limited by the block size, but I didn't get to rebuild the tools on my own with an increased size to verify. Of course not expecting it to match tar+zstd:22, although with deduplication it would be possible to even beat that with large archives as duplicates could appear outside of zstd's window, but for my use case I didn't seem to be gaining much while giving up the convenience of easy to access files. Do note though that my baseline is Btrfs with zstd:1-3 (depending on use case) which is actually pretty decent for saving space already and provides on-demand deduplication too without the resource needs of automatic deduplication. Be aware though that anything more demanding than zstd:3 is often not desirable as the compression is done by the kernel, so:

  • The kernel usually isn't too much into preemption, so on interactive systems ("desktop") file I/O (especially writing) leads to stuttering including audio popping
  • I/O bandwidth for writing easily gets limited by CPU usage. Even network bandwidth could get hard to max out with zstd:22
  • Once the CPU is the bottleneck, strange I/O usage waves start appearing likely every time the kernel decides it's time to do a blocking flush of dirty pages, so the program doing the writing gets blocked for some time, but then when it gets control back, the kernel lets dirty pages accumulate again for some time before starting to compress and write, eventually getting overwhelmed again

I remain interested in the feasibility of using squashfs for (optionally read-only) archival focusing more on size instead of performance given that I haven't found a suitable format. Mostly judging from the quite low block size cap (#217) and file managers not having integrations to browse squashfs archives, I'd guess though that this use case is currently considered unusual.

voidpointertonull avatar Oct 13 '23 17:10 voidpointertonull

@Dr-Emann So block-level deduplication isn't possible, but a file that's a subset of another should be, right?

Imagine a file with blocks [ A, B, C, D ]. A File composed of [ A, B, C ] or [ C, D ] should be able to be fully deduplicated

Of course I'm not really sure what kind of space savings this would make or how much extra work it would take to do this, but it should be possible

mgord9518 avatar May 01 '24 01:05 mgord9518

@Dr-Emann So block-level deduplication isn't possible, but a file that's a subset of another should be, right?

I designed the layout and wrote the code and so I think you should be asking me that question.

Things like that are possible, and that and other things are still on my "to do" list. It is a question of priorities and time. Every year I get requests for new features, and I can only do some of it. I don't get any money for Squashfs, and all work is done in my "spare time".

plougher avatar May 01 '24 11:05 plougher

@plougher It was just out of curiosity and I didn't want to drag you into a basically dead thread

But I do have a semi off-topic question while you're here: I've been working on a SquashFS implementation and this code in the Linux kernel appears to enable the SQUASHFS_COMPRESSED_BIT if a metadata header has a size of 0, why is that the case? I haven't been able to find any documentation on it

mgord9518 avatar May 01 '24 14:05 mgord9518

Hmm. That is odd, squashfs tools has the same code.

That would seem to try to say "we have an uncompressed metatdata block that's 32k coming up". Presumably that would always be immediately rejected because 32k will be larger than the output buffer which should always be 8k, unless it's the last block, in which case it will be less than 8k.

Dr-Emann avatar May 01 '24 15:05 Dr-Emann

@plougher It was just out of curiosity and I didn't want to drag you into a basically dead thread

But I do have a semi off-topic question while you're here: I've been working on a SquashFS implementation and this code in the Linux kernel appears to enable the SQUASHFS_COMPRESSED_BIT if a metadata header has a size of 0, why is that the case? I haven't been able to find any documentation on it

@mgord9518 That relates to Squashfs V1.0 filesystems. V1.0 used the same two byte block length for data blocks and metadata blocks, and the same code and macros were used for both.

Datablocks in V1.0 could be a maximum of 32K, and an uncompressed 32K block would have "SQUASHFS_COMPRESSED_BIT" set and nothing else.

In Unsquash-1.0.c you can still see the macro being used to get the size of the V1.0 data block.

https://github.com/plougher/squashfs-tools/blob/master/squashfs-tools/unsquash-1.c#L65

You do not need to deal with this, as you probably won't need to deal with V1.0 filesystems.

plougher avatar May 01 '24 16:05 plougher