squashfs-tools-ng
squashfs-tools-ng copied to clipboard
rdsquashfs either hangs or is very slow
I'm using squashfs-tools-ng v1.2.0 on Gentoo on an amd64 machine with lots of memory and a Zen 3 chip. I have a 70 MB file that's a squashfs version (lzo compressed) of a 190 MB directory tree with very many small files (146,000 inodes). For some testing I wanted the original uncompressed tree, so I ran "rdsquashfs -qu / foo".
It appeared to run very slowly (without the -q, the screen filled rapidly with the names of files, as expected, but there are rather a lot). After an age I killed it with Ctrl-C. The top level directories appeared to all exist - I don't know if they were fully populated. I repeated the extract, assuming I hadn't given enough time, or something, but it was still running after more than an hour. "top" showed no significant processing; "iotop" showed rdsquashfs was the heaviest I/O consumer, but only doing 100-200 KB/sec (my 5-disk RAID10 system can achieve 400 MB/sec, so it's not that holding it up).
At this point I realised I could do what I wanted by mounting the squash image and reading it as input (Doh!) - I didn't need to run rdsquashfs at all. This was goodness, as I could read and process the entire directory tree in less than a second! But that leaves something weird in rdsquashfs!
I don't know how the squashfs image was created - it's a Gentoo portage snapshot from a Gentoo mirror, for example: https://www.mirrorservice.org/sites/distfiles.gentoo.org/snapshots/squashfs/gentoo-20230713.lzo.sqfs
Hi,
if you are unpacking the entire image, that is going to be slower than mounting it and accessing it. rdsquashfs
essentially does the following:
- The entire directory tree is scanned and reconstructed in memory
- It is sorted and sanity checked (i.e. no two files with the same name in a directory; if one of them was a symlink, this could be used for directory traversal, a well known issue with archiving programs)
- The directory tree is recursively created on the output filesystem
- The files are sorted so that the image is accessed mostly sequentially and tail-end blocks don't have to be unpacked several times over
- The files are unpacked.
In contrast, if you mount the image, only step 1 one happens. It also happens asynchronously, on demand as you start traversing directories. If you don't access the file contents, no file blocks have to be unpacked either, only the meta data blocks from the inode and directory table. The SquashFS kernel driver furthermore has a multi thread decompressor queue, and caches meta data blocks.
If you are only interested in inspecting directory listings, rdsquashfs -l <path> <image>
produces a tar-style listing of a selected directory.
Alternatively, rdsquashfs -d <image>
produces a listing of the entire image, intended to be compatible with the input format for gensquashfs
, i.e. you'll get lines of the shape <type> <path> <mode> <uid> <gid> <extra>
. For the image you linked to, producing such a listing takes about a second of pre-processing time on my 6 year old laptop, as it recurses through the directory tree.
Over an hour for a 190 MB directory tree seems excessive though.
unsquashfs
unpacks the same image in about 3 seconds, and I gave up after a few minutes with rdsquashfs
, something seems off.
Ah, needed to wait a little more, not seeing over an hour here, but still pretty long:
Executed in 198.97 secs fish external
usr time 3.35 secs 0.00 micros 3.35 secs
sys time 16.46 secs 780.00 micros 16.46 secs
- It is sorted and sanity checked (i.e. no two files with the same name in a directory; if one of them was a symlink, this could be used for directory traversal, a well known issue with archiving programs)
Do you have a testcase for this issue or a malformed sqfs archive?
@Gottox there is an intentionally broken archive in https://github.com/AgentD/squashfs-tools-ng/blob/master/bin/rdsquashfs/test/pathtraversal.sqfs, along with a script that runs rdsquashfs
to unpack it and checks if the file in question was created. This test is run by make check
along with all the other unit & integration tests.
unsquashfs
from squashfs-tools also guards against this kind of issue. There allegedly are "extensive tests" run before releases, but but I'm not aware of any publicly available test suites.
Other archivers guard against this as well (e.g. GNU tar, BusyBox tar, ....), as this kind of problem plagues pretty much every format that supports symlinks.
Thanks @AgentD!
libsqsh
does sanity checking while extracting, not beforehand, I guess that's a faster approach at the cost of accepting some malformed archives. So, in that regard, it's just as secure as tar. sqsh-unpack
uses mkstemp-extract-rename semantic to prevent writing through symlinks. That means doing the check in the library isn't needed.
Personally, I doubt that squashfs-tools has a decent test suite.