squashfs-tools icon indicating copy to clipboard operation
squashfs-tools copied to clipboard

Feature request: increase max block size from 1M to 16M

Open cgm999 opened this issue 2 years ago • 10 comments

Hi,

I am using a patch to increase the block size because is having better compression. Is it possible to increase it in a future version? I forgot what was the max block size that does not break the binary optimization related to some marker..

cgm999 avatar Jan 21 '23 10:01 cgm999

Hi,

I am trying to test this feature also, I modify the squashfs_fs.h which is both in squashfs-tools and kernel, simply modify the SQUASHFS_FILE_MAX_SIZE and SQUASHFS_FILE_MAX_LOG to implement this function(but I only change the block size to 8M). This patch can really reduce the size of squashfs image, but I really worry about the impact of this feature on read performance.

maiziyi avatar May 05 '23 03:05 maiziyi

Assuming SquashFS readers don't discard the 3 bits between 1 << 20 (current max block size) and 1 << 24 (compressed bit), it could be increased to 8MiB without breaking anything. With some small changes to readers (a couple lines of code), the block size could be increased to 16MiB. Either choice should probably increment the SquashFS minor version number since existing tooling can currently make the assumption that blocks will be no larger than 1MiB.

This code snippet shows the layout of a data block reference:

pub const DataEntry = packed struct {
    // Maximum SquashFS block size is 1MiB, which can be
    // represented by a u21
    size: u21,

    // If we use these, 8MiB can now be represented
    UNUSED: u3 = undefined,

    is_uncompressed: bool,
    UNUSED2: u7 = undefined,
};

Technically speaking, the upper 7 bits could even be utilized, but that would be hacky and complicated; they might also be better used for something else.

Increasing the block size might be good for future-proofing but really impacts random access performance on today's computers.

mgord9518 avatar May 02 '24 06:05 mgord9518

For me 16MB blocks works fine.. and is making the sq image smaller . I attached my patch if someone finds usefull .. I also use zstd for good decompress speed.

EDIT: Patch is broken if data is not compressable so I removed it to avoid any issues with data loss.

To mount images I use https://github.com/vasi/squashfuse which does not require any patch

cgm999 avatar May 02 '24 07:05 cgm999

Increasing the block size might be good for future-proofing but really impacts random access performance on today's computers.

It could be for more than just future-proofing. Surely it's not really the goal of the project, but with a bit of compression improvement and better tooling (I guess mostly libarchive support for "transparent" I/O support), squashfs could become a viable replacement for a bunch of tar use cases.

Random access time for tar files is pretty much a worst case scenario so even hundreds of MiB block sizes would be an improvement there, they just have a compression advantage due to being compressed as a single block, and support for all kinds of related formats, even the not too old tzst is quite wide in file managers.

Not sure if this kind of archive file use case is ever planned to be "supported", but I've definitely "abused" it as such in a few cases as a large tar file isn't feasible to browse, and 7zip is too dumb to deal with even just symbolic links, while squashfs makes a proper archive that can be browsed (with FUSE mounting at least), it's just not as well compressed as the other options.

voidpointertonull avatar May 02 '24 07:05 voidpointertonull

@cgm999 squashfuse (as well as the Linux kernel) actually would require a patch for this. If a block doesn't compress, the compressed bit will be set and additional logic will need to be used to correctly get the block size. Of course you won't run into this situation often, but once you do it'll make for some hard to track bugs and corrupted reads.

This code already exists in squashfuse, it would just need to be moved to the sqfs_data_header function, which happens to be directly below the function it's currently in.

mgord9518 avatar May 02 '24 07:05 mgord9518

@voidpointertonull Yeah, I'm not gonna argue that. It could be useful for long-term backups where random access speed isn't super important.

Although once you break 16MiB, SquashFS would need some major format changes, which obviously wouldn't be compatible with current tooling

mgord9518 avatar May 02 '24 07:05 mgord9518

@mgord9518 Ah yes, I guess I never hit that case where a block is not compressed and used as is .. and I do use this patch for years (is same type of data which I guess it explains why I never hit the issue , and I do compare the source with mounted squashfs via fuse then remove source ).

cgm999 avatar May 02 '24 09:05 cgm999

@mgord9518 The compressed bit was set at 1 << 24 to allow for increases in the maximum block size, if required. The max block size of 1M was chosen in 2009 because the only compression algorithm at the time (in the kernel) was gzip and that can't make good use of 1M blocks anyway, because the window size is too small. xz/zstd can and increasing the maximum block size is already planned for the next major release (4.7).

plougher avatar May 02 '24 10:05 plougher

@plougher Nice, I'm excited to mess with larger block sizes. Why was 1<<24 chosen over 1<<31?

mgord9518 avatar May 02 '24 11:05 mgord9518

@plougher Nice, I'm excited to mess with larger block sizes. Why was 1<<24 chosen over 1<<31?

It left the upper bits free for other uses.

plougher avatar May 02 '24 11:05 plougher