umoci *: improve performance

Currently umoci (especially umoci unpack) is less sprightly than would be ideal. Some of this is a natural result of working with tar archives (which disallow parallelism during extraction) but it feels like we should be able to extract things much more quickly. go-mtree generation is also quite expensive so maybe we should consider generating it during extraction (though the concept has scared me in the past).

But it would also be nice to have pprof profiles be generated as part of the test suite so we can get a better idea of where the slow points are.

Jul 01 '20 05:07 cyphar

It looks like a lot of the time is spent in decompression (which is sadly single-threaded and is the same as stock encoding/gzip). Hashing does cost us a fair bit, but the read-ahead for pgzip takes up ~38% of our cputime. And it's pretty clear that go-mtree is taking much more time to generate a manifest than it took us to do the extraction (~32% vs ~21%). If we generated the go-mtree ourselves during extraction we might be able to get some solid speed gains (though this does mean we'd need to be careful to make sure the mtree output actually matches the filesystem otherwise umoci repack will be confused).

I guess this mirrors what you've seen @tych0?

2020-07-01-171234_1739x664_scrot

Jul 01 '20 07:07 cyphar

Yeah, exactly. Ideally we wouldn't really need mtree at all :)

Jul 01 '20 14:07 tych0

Ideally, yes. But we could also take another look at switching just the decompression side to zlib (as we discussed in #225 and #233). Unfortunately in the meantime vitess have deleted their cgzip code so we'll need to include it in our repo as a third_party module...

Jul 02 '20 02:07 cyphar

Or maybe modernize on zstd? I'm working on moving us to a parallel-extraction overlayfs no-mtree model right now to get rid of this problem, so I don't think we're so worried about which compression algorithm we'll use.

Jul 02 '20 02:07 tych0

While we can (and should) add support for zstd -- the issue is that most other image builders aren't producing zstd images (and others' don't support zstd yet) so we're going to have performance issues with decompression for at least a little while longer. For umoci I'd propose we add a new --compression option (in line with what I described in #316 and #300).

Jul 02 '20 03:07 cyphar

Regarding the cost of go-mtree, yes that's what I found out too, that's why I pushed for raw unpack https://github.com/opencontainers/umoci/pull/239

But parallel extraction combined with overlayfs is also what we ended up using for our solution :)

Jul 09 '20 18:07 flx42

@cyphar is there anything actionable here for a new contributor? I'm using umoci as the base for an OCI plugin in the Pants build system, and unpack/repack is a substantial portion of many build steps. Just not sure where to start in this codebase...

May 25 '23 08:05 tgolsson

@tgolsson I'm not sure if there's a nice place to start necessarily. I think the best performance improvement we could make here is if we generate the mtree tree during unpack (meaning we create a "fake" tree, eliminating the need for umoci unpack to re-walk the tree). The main issues are going to be:

Dealing with archives that have multiple entries with the same name (and dealing with whiteout entries in subsequent archives). This will require re-writing (and deleting) existing entries in the mtree list, which isn't keyed by path in go-mtree IIRC.
Dealing with metadata and making sure it'll match the metadata on-disk. However, we use tar_time so this might actually be perfectly fine to just copy from the tar metadata. My main concern is xattrs -- we might need to update the entry for each mtree block in restoreMetadata (because that's the actual moment when we know whether the xattrs were actually applied or if the filesystem didn't support them).

The slightly annoying thing is that go-mtree cannot handle the full-path format of mtree (where each entry is referenced by it's filename, as opposed to the stack-based scheme that go-mtree uses and BSD mtree uses by default). If it did, we would be able to generate all the entries independently and then just stitch them together at the end (and both problems would be solved fairly easily). However, this is a bug in go-mtree so we would need to fix it there (the main issue is that the behaviour of pathnames changes drastically if you use the full-path mode and how this interacts with comparisons between manifests will need to be thoroughly tested).

Honestly, it's probably simpler for me to take a crack at this. I wasn't sure if this was still an issue for most people so I didn't set aside any time to work on it (and I've been busy with other projects recently).

Jun 03 '23 05:06 cyphar

But if you would like to help, taking a look at the go-mtree side of things would be appreciated. https://github.com/vbatts/go-mtree/issues/146 is the upstream bug (my comment from a few years ago was misdiagnosing the problem -- casync's output is valid mtree and go-mtree cannot handle it).

Jun 03 '23 07:06 cyphar

FWIW, I have since figured out and fixed the issues in go-mtree (https://github.com/vbatts/go-mtree/pull/188 fixes the relevant issues for us to be able to use the FullType version of the manifest -- meaning we can generate the manifest with a per-full-path map first during unpacking and splat out a manifest at the end).

Oct 02 '23 07:10 cyphar