Make it easier for contributors to update Docker images
There is some background context in this Discord discussion.
as with most OSS projects, a push to the Dockerfile on main triggers a rebuild on an ephemeral runner and uses the project token to push to the GitHub package repo without more side-channel auth stuff. The label is based on the branch, so if I push to "test-my-change", then that gets pushed to a label of "test-my-change". Basically, the security permiter ends up defined as who has write access to the repo and is traceable to commits. This seems superior in every way. We could also configure GA to push to another container registry but would need to manage the secrets, etc (and we'd be right back to where we started).
We have ~17 Dockerfiles in the main repo that require authentication (Storage Admin role in the iree-oss GCP project) to update: https://github.com/openxla/iree/tree/main/build_tools/docker. It would be much more convenient for contributors and maintainers if any approved commit could update the images used.
We should see what other projects are doing and modify our setup to be less bespoke.
Possible requirements:
- Contributors can test that a Dockerfile builds
- Contributors can test that a Dockerfile does what is expected in a CI workflow
- Postsubmit uses the latest build of all images from checked in code (our current solution uses a manifest file for this that can get out of sync)
TBD how much scripting/automation is needed for this.
And then we could just move IREE's docker image building over there. I just set up a GH action to update them when the file changes and they push to GH's docker repository. It also seems like we should have fewer docker images, but I've not looked closely in quite some time.
As part of moving more jobs from ci.yml go pkgci.yml, I've been chipping away at our use of Docker images. Better to not use Docker at all if we can avoid it.
- (Already done, for build/test jobs, not benchmark jobs) NVIDIA CI jobs can use whatever the runners have already (this seems good enough for now), or https://github.com/Jimver/cuda-toolkit, instead of
nvidia.Dockerfileornvidia-bleeding-edge.Dockerfile - (In progress) Android CI jobs can use https://github.com/nttld/setup-ndk instead of
android.Dockerfile - (Not started) Emscripten CI jobs (if we care) can use https://github.com/mymindstorm/setup-emsdk instead of
emscripten.Dockerfile
Seems like we could fork https://github.com/nod-ai/base-docker-images into iree-org and iterate from there. Can keep the original repo in nod-ai so existing packages continue to exist and new dockerfiles/images specific to that github org can be developed.
Seems like we could fork https://github.com/nod-ai/base-docker-images into iree-org and iterate from there. Can keep the original repo in nod-ai so existing packages continue to exist and new dockerfiles/images specific to that github org can be developed.
Sent an RFC to fork that repo into iree-org: https://groups.google.com/g/iree-discuss/c/IPLzMsPb5UI
I've almost finished switching workflows to using dockerfiles hosted at https://github.com/iree-org/base-docker-images/.
Here's what's left:
- [x]
build_test_all_bazelinci.ymlusesgcr.io/iree-oss/base-bleeding-edge - [x]
build:remote_cache_bazel_ciiniree.bazelrcusesgcr.io/iree-oss/base-bleeding-edgein the cache key value - [x]
publish_websiteinpublish_website.ymlusesgcr.io/iree-oss/base - [x]
webinsamples.ymlusesgcr.io/iree-oss/emscripten - [ ]
linux_arm64_clanginci_linux_arm64_clang.ymlusesgcr.io/iree-oss/base-arm64
For Bazel, we could try using https://bazel.build/install/docker-container (e.g. gcr.io/bazel-public/bazel:latest). That wouldn't have any of our other build deps or configurations for ramdisks, remote caches, etc. baked in though... if any of that is still critical.
Actually nevermind RE: Bazel...? The source for that is https://github.com/bazelbuild/continuous-integration/blob/master/bazel/oci/Dockerfile and it's just for using Bazel. The entrypoint is hardcoded to /usr/local/bin/bazel, when I think we really just want a general purpose container with software installed on it.
I'm planning to start migrating build_test_all_bazel to a new dockerfile running on the new Azure build cluster soon. We'll see what issues I run into :)
Done!
- Dockerfiles are now hosted in https://github.com/iree-org/base-docker-images/, which contains automated workflows to publish to GitHub's Container registry (https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-container-registry). Members of the iree-write team can contribute to that repository to make updates to the published images, with no additional privileges required.
- There are 6 dockerfiles there at the moment
- 3 dockerfiles are being forks between "ghr" (GitHub Runner) and "non-ghr" variants, and we aren't actually using the "ghr" variants right now
- Nearly all workflows just use the
cpubuilderimage, with themanylinuximage being used for Python package building, and theamdgpuimage currently used (maybe?)
- All workflows have been updated to use the new dockerfiles, or no dockerfile at all.
- When not using Docker, workflows instead rely on either
- software included in GitHub's standard runners
- software included in our own self-hosted runners
- software that can be easily and quickly installed on demand (e.g. python packages via pip, the Emscripten SDK, etc.)
- Documentation for our usage of Docker is now at https://iree.dev/developers/general/github-actions/#docker-and-dependencies