buildx icon indicating copy to clipboard operation
buildx copied to clipboard

Add async flow/network failure resilience to kubernetes driver

Open droopy4096 opened this issue 2 years ago • 1 comments

Description

In situations where kubernetes connection is guarded or passes proxy or is using unstable connection - can we have ability to launch jobs asynchronously?

In more detail: can we have something like:

docker buildx build --detach ...

implemented where build is being initiated and once all necessary information passed over to builder connection is intentionally severed while build completes autonomously?

With above implementation and for the workflows that depend on build completion but are on unstable connection option --poll can be added:

docker buildx build --detach --poll ...

which would still launch autonomous build, but having all the build information at hand could periodically poll build status and report back all the way until completion. Which on surface will look like present synchronous flow but would be more resilient to network failures.

droopy4096 avatar Jun 30 '23 13:06 droopy4096

We're still running into multiple issues regarding use of buildx kubernetes driver:

  1. default manifest for deployed buildkit pods allows for: a. jobs to be evicted mid-air due to cluster scaling events b. resource over-consumption due to lack of paralellization control c. "hangs" of buildx when connectivity is lost to evicted pod

further to this: #2056 may be useful to preserve state for each pod allowing for pod relocation and at least reuse of existing cache to get faster to the point where failure occurred.

droopy4096 avatar Oct 18 '24 21:10 droopy4096