Investigate BuildKit front-end for builds that are fully-local
This current implementation of pack build for creating new images with lifecycle+pack is more efficient than buildkit when used to publish images directly to a registry (with --publish). But it's likely not more efficient than buildkit for fully-local builds against the Docker daemon (without --publish).
One idea to improve performance would be to switch to using a local registry for builds that don't use --publish. However, as an alternative, we could consider implementing fully-local builds using a buildkit frontend. This would require significant work in both pack and the lifecycle, but given that --publish is not the default, I think it's worth investigating.
A frontend already exists for v2 buildpacks: https://github.com/tonistiigi/buildkit-pack
CC: @ekcasey @jromero
Hi folks! I've made a barebones BuildKit frontend (available here). It's doesn't offer the full functionality/spec compatibility, but would love to hear your feedback š
Hi folks! I've made a barebones BuildKit frontend (available here). It's doesn't offer the full functionality/spec compatibility, but would love to hear your feedback š
This is great! Thanks for the initiative. I haven't played with it or looked at it in depth but hope to do so next week. I can provide more feedback then.
Awesome work @EricHripko! How are you handling launch layers that aren't rebuilt or cached (so they're not present in the container), but still need to be carried over from the last image?
After chatting with Tonis, it seems like we'd need https://github.com/moby/buildkit/issues/1431 to implement our caching model.
Right now the frontend isn't doing anything smart when it comes to reusing launch layers (thus every build is from scratch). That being said, I think MergeOp isn't (strictly speaking) needed if the scope of the frontend is to just support local workflows. From what I understand from the spec on launch layers, caching can be achieved with some creative (and somewhat hacky) use of cache mounts.
This bit is a very interesting problem to solve:
// Export the produced layers into an OCI image. Unlike other high-level // functions in this package, we have to manage the export manually (without // lifecycle) to fit in with the BuildKit model of the world.
It would be nice if the lifecycle export function could be used. Lacking solutions but wanted to call it out as a pain point if we need to maintain multiple "exporting" implementations.
CC: @buildpacks/implementation-maintainers
The idea of needing a Dockerfile to execute buildpacks which in most cases would only require one option (the builder) seems like a poor experience. If the user ends up requiring more configuration options (ie. buildpacks, env vars, etc), it feels like we would be reimplementing project.toml. This makes me wonder if we could leverage, or better yet overlap, project.toml. There is an upcoming option to add builder to project.toml.
Examples
NOTES:
- In the examples below the input format is exactly the same (TOML).
- The tag of the image could furthermore define the project descriptor api version. In these cases, the API version would be 0.2.
project.toml
File System
.
āāā project.toml
āāā <source code>
āāā ...
Configuration
project.toml
# syntax = buildpacksio/buildkit:0.2
# ...
[build]
builder="my-builder"
# ...
Execution
docker build -t cnbs/sample-app -f project.toml .
Dockerfile
File System
.
āāā Dockerfile
āāā <source code>
āāā ...
Configuration
Dockerfile
# syntax = buildpacksio/buildkit:0.2
build.builder="my-builder"
Execution
docker build -t cnbs/sample-app .
+1 on @jromero's idea of using project descriptor as the input file, I think it also begets the question that was asked in the builder rfc - how many of packs flags should we replicate in the project descriptor and how do we distinguish the ones that are pack specific vs others while still making config management easy on pack side š¤ (might be off topic for this issue)
If we think about the project descriptor as a way to configure the project for a "platform" as oppose to pack (and now buildkit) we should have a better definition on what should be included vs excluded.
From https://github.com/buildpacks/lifecycle/issues/423#issuecomment-811974384:
This may also be useful for the BuildKit integration. Specifically in removing the need to reimplement export.
It would require more thought but maybe the
exportphase could take an option to export asoci layoutformat. This oci layout directory or archive can then be read by the BuildKit frontend and processed to include layers and image config (labels, env vars, etc).
I was going to comment along the same lines. Being quite new to BuildPacks, I found it surprising (in terms of separation of concerns) that lifecycle instead of the platform is responsible for exporting images. It's strange that the API boundary isn't drawn on the image spec layer, but instead exporter has to support various export methods (docker daemon or image registry of today). Does this mean that should BuildPacks want to support podman (or some other future tool), all exporter implementations will have to support an extra export method?
The idea of needing a Dockerfile to execute buildpacks which in most cases would only require one option (the builder) seems like a poor experience.
Definitely agree on this point if there's a better alternative available (e.g., builder setting in project.toml).
If the user ends up requiring more configuration options (ie. buildpacks, env vars, etc), it feels like we would be reimplementing project.toml
Note that BuildKit does support additional inputs (these usually map to docker build arguments). For some configuration options (e.g., env. vars), it might make sense to support them this way:
# docker-compose.yaml
build:
dockerfile: project.toml
args:
VAR: VALUE # or below
VAR: $VAR
(dockerfile key is quite awkwardly named though, I must admit š )
Note that BuildKit does support additional inputs (these usually map to docker build arguments). For some configuration options (e.g., env. vars), it might make sense to support them this way:
How do args work with frontend syntax? For example, are args replaced prior to being provided to the frontend or does the frontend have to process them (and provide ARG syntax for it)?
IIUC, it's fairly straightforward as you get final values of build args as map[string]string. I've jotted down a barebones implementation of this in my repo (see https://github.com/EricHripko/cnbp/commit/32bb24b6686d2baa247627698ac397177dad108c). Setting --build-arg BP_JVM_VERSION=8 let me control the version of Java that detected Buildpacks end up using.
Another quick update: I had a go at implementing caching with BuildKit build-time mounts. It somewhat works with a bunch of caveats (see https://github.com/EricHripko/cnbp/commit/96e4d774d8b601cf46f66f3830cfa03279f07ced); key takeaways for me were:
- Difference in caching models - BuildKit caches are machine-wide by default whilst Buildpacks seem to expect the cache to be specific to app (essentially image name) you're building. I did a trial spin and it seems like default caching logic in
exporterwipes out the layers that aren't related to the current build. While BuildKit cache can be "namespaced", a unique identifier of some sort is needed. Unfortunately, it doesn't seem like BuildKit receives the image tag as an input - so no straightforward answer there. Note that this could be whyDockerfilefrontend only allows namespacing caches via (somewhat hidden)BUILDKIT_CACHE_MOUNT_NSbuild arg. Not sure how to best bring these two together - would love to hear ideas from folks here! - Incomplete restore - due to the fact that lifecycle supports only a limited set of sources for incremental builds (discussed above), only
cache=truelayers are reused (launch=truearen't). Whilstlaunch=truereuse can be implemented, it'd be at a cost of reimplementinganalyzer&restorerin BuildKit terms (unless something like https://github.com/buildpacks/pack/issues/1125 is implemented).
Namespacing a single cache volume would be great for CNB/pack anyway IMO. We've talked about machine-wide cache solutions in the past. Maybe something like project.toml's id or name could, by default, be the cache namespace. That would allow you to build multiple images and share the same cache as long as it is the same project. We could look at opening up cache via something in project.toml like [io.buildpacks.cache] scope = machine or similar if users really want to share cache across all their CNB projects.
Just something to think about...
RE: cache scoping
There was an RFC I opened some time ago in regards to cache scoping. The conversations there may be relevant here. The final outcome was that the concerns for cache poisoning were too great. That said, it may be worth re-discussing based on new limitations/findings.
Hi @EricHripko, there are various people in the community that are very interested in the work you've been doing. It would be great if you could join one of our Office Hours or Working Groups to talk to the broader audience about this effort. I understand if it isn't ideal but if you can, please reach out so we can schedule something and have the right people in the room.
email: [email protected]
Hi @jromero - thank you for this invitation! I've emailed you to understand this a bit more š
BuildKit Update:
I did talk to [@ekcasey] about the possible integration with BuildKit and we landed on the idea that the lifecycle should support OCI standards as a the general practice going forward.
Short-term, what this means is that we (Iām) going to investigate how an OCI layout format interface (analyze/export) would work. This would hopefully yield an RFC.
Long-term, it would be ideal if we could drop daemon support (shifted to be a platform concern) and only support OCI registries and OCI layout.
-- posted on #implementation channel
@EricHripko thanks for much for the insightful presentation last week!
I'm having a bit of trouble following how the llb.Copy operations in the POC in creating the layers (such as here) would mesh with the lifecycle if we added support for exporting images in OCI layout format. Is there a different LLB operation that would be used instead, if we already have a tarball for each layer?
If images were exported in OCI layout format, could they also be consumed this way? I.e., when analyzer & restorer are pulling data off the "previous image" could that image be in OCI layout format as well? (And exporter too for that matter...putting the question of MergeOp aside for the time being.)
Thank you for sharing the recoding and thanks to community for having me š
At the very least, llb.Copy supports AttemptUnpack flag which should be able to take a tarball and turn it into a layer with BuildKit (somewhat equivalent to ADD instruction of the Dockerfile). It's worth pointing out that this'd be rather hacky as build process would involve an extra compress/decompress cycle. The performance overhead of this depends on whether layers are stored in .tar, tar.gz or other format.
There're various ways to avoid this overhead: BuildKit could potentially have a lower-level "add layer" option, or exporter could export .tar/OCI folder structure, or some other compromise.
cc @jericop @natalieparellano
Just a quick link here, maybe it could be useful for the LFX mentorship program we are running this period