kaniko icon indicating copy to clipboard operation
kaniko copied to clipboard

Not concurrency safe

Open ferbs opened this issue 2 years ago • 8 comments

I noticed that kaniko uses its own /kaniko bin directory for working runtime data, making it unsafe to run concurrently and perhaps unsafe to run more than once sequentially. Eg, copyDockerfile in cmd/executor/cmd/root.go always copies the target dockerfile to the same location /kaniko/Dockerfile.

I'd like to run Kaniko with AWS Lambda but that environment reuses running containers--I don't know of a way to guarantee a fresh container for each time it's invoked.

Please consider keeping all kaniko runtime data in a separate subdirectory per kaniko execution, ideally within /tmp or a specified parent KANIKO_WORKING_DATA directory, embedding timestamp + random chars in the subdir name.

ferbs avatar Jan 02 '22 02:01 ferbs

I think this is a known limitation, perhaps this should be spelled out better in docs. Kaniko's intended use case is to be run inside a single-use container, building and pushing an image, then being torn down. Reusing the container environment is not a goal.

For https://kaniko.kontain.me I account for this by only allowing one concurrent requestin Cloud Run, but it sounds like that might not even be sufficient since a subsequent request that reuses the environment may get tainted by previous runs.

Please consider keeping all kaniko runtime data in a separate subdirectory per kaniko execution, ideally within /tmp or a specified parent KANIKO_WORKING_DATA directory, embedding timestamp + random chars in the subdir name.

That's worth considering, and I'd be willing to help work through what it would take. I think we'd have to keep things inside /kaniko, to avoid tainting the image that's being built, but I think it could be /kaniko/<execution-ID>/Dockerfile, etc., for instance, and just have those deleted and/or ignored for future executions.

imjasonh avatar Jan 02 '22 03:01 imjasonh

spelled out better in docs

The wording in your comment would be great for that: "Kaniko's intended use case is to be run inside a single-use container, building and pushing an image, then being torn down." And maybe make it more explicit by adding: "It is not meant to run within a function-as-a-service / serverless environment (like AWS Lambda, Azure Container Apps, Knative, etc.) without externally controlling concurrency."

Reusing the container environment is not a goal

There is a lot of growth in this FaaS stuff but I haven't found build tooling that works natively. Maybe a good goal to work towards for kaniko afterall? Or if it's not a sensitive subject, I'd be curious to know the reason. (Not what the maintainers want/need, technical difficulties, biz/sponsorship reasons? All fair enough.) Our future robot overlords require building and delivering themselves, your sedition is noted. :)

keep things inside /kaniko, to avoid tainting the image that's being built

Ah, guess I understand even less about how kaniko works than I'd presumed. The docs say it copies kaniko executables into the image, so I was thinking some adaptor was included along with the normal user-specified subset of the build context, and so making it concurrent and stateless (ish) was within reach.

only allowing one concurrent request

Ok, I'll next try automating the creation and teardown of a dedicated Lambda for each build, based on the same kaniko parent image.

ferbs avatar Jan 02 '22 20:01 ferbs

There is a lot of growth in this FaaS stuff but I haven't found build tooling that works natively. Maybe a good goal to work towards for kaniko afterall?

I'd be curious to hear what else you've tried for building FaaS images -- buildpacks seems like the best option for that honestly, at least better than generating(?) and building Dockerfiles with kaniko.

Or if it's not a sensitive subject, I'd be curious to know the reason. (Not what the maintainers want/need, technical difficulties, biz/sponsorship reasons? All fair enough.)

Honestly, kaniko is only barely maintained at all. Let alone imbued with any kind of roadmap, aside from "stave off its inevitable death by bitrot".

The good news hidden in there is that if there's something you want to do to kaniko, and you can convince me it doesn't break existing users, you can probably get away with it. Want to make kaniko more concurrency-friendly? More Lambda-environment-friendly? I'd be happy to review any PRs that get sent my way. 😎

Our future robot overlords require building and delivering themselves, your sedition is noted. :)

lol, they'll have to catch me first.

Ah, guess I understand even less about how kaniko works than I'd presumed. The docs say it copies kaniko executables into the image, so I was thinking some adaptor was included along with the normal user-specified subset of the build context, and so making it concurrent and stateless (ish) was within reach.

We might have similar levels of understanding about how kaniko works, honestly. My impression was that kaniko kept itself and any necessary runtime data inside /kaniko, and made sure that while it was executing commands inside its container, that it just ignored that path when taking filesystem diffs to generate layer contents.

There might be other ignored paths, but e.g., if a Dockerfile directive writes to /tmp, that path should be preserved in the final image, so if /kaniko writes something to /tmp/hello, that will end up there too, and the image would differ between kaniko and buildkit. 🐛

imjasonh avatar Jan 03 '22 13:01 imjasonh

I misunderstood the how-it-works section until seeing your reply here. So it sounds like Kaniko transforms its own container's filesystem using directives in the target dockerfile and then creates a snapshot of itself.

If so, I'd suggest you explicitly enforce one-run-per-container in the executor, it should detect a 2nd run and explode. Otherwise subsequent builds will contain confusing, hard to reproduce bugs. And using the official kaniko image seems like a fairly non-flexible requirement. (Also, no need for the /kaniko/<execution-ID>/Dockerfile stuff above.)

ferbs avatar Jan 03 '22 21:01 ferbs

what else you've tried for building FaaS images

I tried buildah first but gave up immediately after hitting fuse-related errors. My current WIP project does a lot of code generation for customers and a more flexible build step would unlock some cool features. I got bogged down in the research though and decided to defer on my ideas for customer containers and self-delivery. Then while investigating fallbacks, I came across @vfarcic's recommendation and thought I'd give Kaniko a try.

Re buildpacks, it looks like it creates its own Dockerfile. Prob fine in most cases but don't think it would work for some of my own own images. (Eg, installing cgroups.) genuinetools/img looks like it's worth a try--later though, am past my self-imposed timecap.

ferbs avatar Jan 03 '22 21:01 ferbs

Re buildpacks, it looks like it creates its own Dockerfile. Prob fine in most cases but don't think it would work for some of my own own images. (Eg, installing cgroups.) genuinetools/img looks like it's worth a try--later though, am past my self-imposed timecap.

Buildpacks definitely doesn't generate a Dockerfile, at least not any of the buildpacks implementations I've seen. The last comment before that issue is closed says that a buildpack implementation could support this if they wanted to, but that it shouldn't be a built-in behavior of pack or a buildpack lifecycle implementation.

I understand that being tied to a buildpack builder's implementation and limitations might be too constraining, and that writing your own may be too heavy of a lift, but I do think you might have a better time of that than of making Kaniko support being run inside Lambda. I got it to run inside Cloud Run for kontain.me, but that environment is more like the one it expects than Lambda. That being said, if there's anything I can help with, let me know.

If so, I'd suggest you explicitly enforce one-run-per-container in the executor, it should detect a 2nd run and explode. Otherwise subsequent builds will contain confusing, hard to reproduce bugs. And using the official kaniko image seems like a fairly non-flexible requirement. (Also, no need for the /kaniko/<execution-ID>/Dockerfile stuff above.)

This could be a good idea. For kontain.me (and I suspect for your Lambda usage), it might be difficult to handle this for subsequent requests, since they would explode and need to be retried on a fresh instance. I don't know how hard that is to configure on Lambda, or maybe in an API gateway in front of it. In a pathological case, an unlucky incoming request could get bounced across tons of already-used instances and potentially never find one that's usable, that would be a pretty bad experience.

I'd like to see if we can figure out how to get Kaniko to reset its environment better, or stuff ephemeral state into more reuse-friendly places. But, just as a warning, as noted in https://github.com/GoogleContainerTools/kaniko/issues/1868#issuecomment-1003323755:

Other than that the warning is also saying "don't assume we'll be able to fix/prioritize bugs that only affect your unusual way of using kaniko" but that shouldn't stop you from experimenting.

imjasonh avatar Jan 04 '22 14:01 imjasonh

Hi @ferbs, I am currently looking to execute builds using kaniko inside AWS Lambda but running into multiple permission related errors as kaniko tries to change owner/permissions on a lambda's read-only file system. Looks like you had some success in running kaniko inside AWS Lambda. Do you mind sharing the sample code/repo so I can benefit from your learning? Thanks.

hariohmprasath avatar Jul 08 '22 17:07 hariohmprasath

@hariohmprasath, I'm not using Kaniko and didn't take the time to document my experiments, sorry. Not sure what you're trying, but the approach of extending kaniko with an AWS runtime interface client seems to be a dead end. There's a lot of activity around rootless builds, maybe try one of the projects based on buildkit?

ferbs avatar Jul 09 '22 00:07 ferbs