cibuildwheel icon indicating copy to clipboard operation
cibuildwheel copied to clipboard

Is there a way to extract non-wheel outputs from a cibuildwheel Docker container?

Open jbarlow83 opened this issue 2 years ago • 10 comments

Description

As far as I can tell, cibuildwheel is only prepared to copy wheels back to the host and has no room for exporting side channel shenanigans like I'm considering. Is that true, or is there are a supported way to copy non-wheel objects to the host?

Context: Exploring options to generate manylinux-aarch64 wheels for pikepdf using cibw, running an QEMU-emulated Docker container for aarch64. The main problem with this case, as observed in other open issue tickets, is that emulation is really really slow.

Cross compiling seems to be a dead end and can be quite tricky with third party libraries whose configure-make scripts don't always contain plans for crosscompiling.

I think a fairly straightforward (for some generous definition of "fairly" and "straightforward") option would be to wire up ccache to speed up builds. The problem is that the cache needs to be copied from the Docker container back to the host (which would presumably use its CI runner to save and restore the cache). An alternative would be to copy precompiled binaries in with way to check if the cache is stale.

Copying data into the host isn't that much of an issue - the /project folder is passed to the container and could contain a cache folder.

jbarlow83 avatar Feb 20 '22 05:02 jbarlow83

Yeah, this seems reasonable. We've previously discussed https://github.com/pypa/cibuildwheel/issues/363 some sort of specific extraction of files from the build as well, for example, coverage reports. But the API design in that issue is about copying out artifacts from each build into a separate folder. For a cache, you probably don't want the files to move around, and perhaps you also want to share them between every build (and even build architecture?).

So perhaps something closer to a mount is suitable? e.g. CIBW_DOCKER_CACHE_DIR={project}/cache would copy in ./cache to /project/cache at the start of the build e.g, then copy out and replace ./cache at the end of the container.

I suppose the question is whether such an API would actually be sufficient for the use cases in #363 as well. In which case a more general name for the option might be appropriate. In any case, I'd be curious to hear your opinion on this @jbarlow83, does this sound like it would fit your use case? It's been a long while since I've used ccache!

joerick avatar Mar 21 '22 22:03 joerick

I have the same question. I also want to use ccache. Right now I'm passing a CCACHE_DIR environment variable to the container. It's unclear in the documentation whether the /project is read-only from the perspective of the host. I'm trying with /host/path/to/ccachedir (following https://cibuildwheel.readthedocs.io/en/stable/faq/#linux-builds-on-docker) now to see if it works.

thomaslima avatar May 02 '22 23:05 thomaslima

@thomaslima did you end up having any luck with that approach?

Mause avatar Nov 16 '22 01:11 Mause

Hi @Mause, thanks for reaching out. Indeed this approach suggested by @joerick is working for me.

  • Step 1: Install ccache in the host. Take note of the .ccache directory location.
  • Step 2: Download cache from GitHub.
  • Step 3: Install ccache inside the docker container, copy the .ccache folder contents from the host inside /output.
  • Step 4: Build project with cibuildwheel as normal.
  • Step 5: .ccache will be within /wheelhouse in the host. Copy that back to the host's .ccache directory from step 1.
  • Step 6: Upload ccache contents to Github.

Here's the code I used for step 3 (before_all step): https://github.com/KLayout/klayout/blob/1f2e8b40125518bc50235802753096348998b409/ci-scripts/docker/docker_prepare.sh#L34-L40

Here's the code I used for step 5: https://github.com/KLayout/klayout/blob/1f2e8b40125518bc50235802753096348998b409/.github/workflows/build.yml#L49-L59

Steps 2,6 were done with the hendrikmuhs/[email protected].

Hope this helps. If this issue is worked on, I would like it to implement steps 3 and 5. Would be nice to pass a list of directories that can be serialized and deserialized to and from the docker container.

thomaslima avatar Dec 01 '22 04:12 thomaslima

While slightly offtopic for the original bug, there seems to be some interest from the participants specifically in caching compilation results, so I'll mention a slightly prettier alternative to @thomaslima's approach: instead of using ccache, use sccache with its GHA integration. That communicates directly with Github's cache storage without using an on-disk cache, which simplifies things a lot.

  1. Use the sccache action and enable GHA in sccache: here
  2. Pass the necessary environment variables into the container: here
  3. Install sccache inside the container: here
  4. Print sccache stats to verify (the action in step 1 also prints a report, but it doesn't seem to capture stats from inside the container): here

bmerry avatar Aug 24 '23 10:08 bmerry

@bmerry not all links work in the above comment. could you fix that ?

BTW I dont understand how does tha cache ends up in the container in your example ?

mwestphal avatar Sep 03 '23 13:09 mwestphal

@bmerry not all links work in the above comment. could you fix that ?

Thanks for pointing it out - I've edited the comment and updated the links to point to the latest version.

BTW I dont understand how does tha cache ends up in the container in your example ?

The cache contents aren't in the container; they're in the cloud.

bmerry avatar Sep 04 '23 06:09 bmerry

The cache contents aren't in the container; they're in the cloud.

Yes, but what mechanism is responsible for downloading them locally ? afaics I need to use a cache action somewhere to do that.

mwestphal avatar Sep 04 '23 07:09 mwestphal

The cache contents aren't in the container; they're in the cloud.

Yes, but what mechanism is responsible for downloading them locally ? afaics I need to use a cache action somewhere to do that.

sccache is the mechanism. It has Github Actions integration.

bmerry avatar Sep 04 '23 07:09 bmerry

You mean the sccache action ? ok its clearer, thanks.

mwestphal avatar Sep 04 '23 07:09 mwestphal