build
build copied to clipboard
Debugging a build description is painful
This is more of a feature request.
When I work on a build description, I find myself doing those steps over and over again:
- kubectl delete -f build.yaml --ignore-not-found=true
- Do the usual kung-fu to read the logs
- Realize there's an error. Not a syntax error but something I forgot or didn't understand when writing the yaml descriptor.
- Change something
- kubectl delete -f build.yaml
- Go to 1
What would be super nice is being able to run a build "locally" with only Docker installed, using some kind of wrapper that reproduces the logic of the build CRD without all the complexity added by Kubernetes. No steps, no initialization containers, direct printing of the logs on stdout...
Something exactly like https://github.com/GoogleCloudPlatform/container-builder-local
Most CI/CD systems fail short of providing a local history allowing faster debugging. And by "local", I don't mean using minikube of D4D. Those are not local. Those are remote clusters that are not too far away.
Having access to the workspace locally while the build is running and after the build is running is a huge help too
I think this bug might be conflating two things:
- a bug: "descriptor errors are hard to debug"
- a feature request: "I want a local off-cluster executor"
For the first, #8 should help a bit, since we can reject invalid build configs with custom hand-written error messages. If you have examples of specific build descriptor errors that were frustrating to debug, let me know and we'll make sure they're better in the new validation scheme.
For the second, I don't think we plan to provide a Build CRD executor that doesn't run on and take advantage of Kubernetes, because that would require its own custom implementation separate from Kubernetes, inevitably with its own divergent set of bugs, which defeats the whole point. In the case of container-builder-local, the code in that package is literally the same code that runs on the GCB worker VM, so the implementation doesn't diverge (...mostly...).
One strong option would be to integrate with Skaffold so that you can quickly iterate on your build definitions, with source from your own local workspace. That would definitely give a tighter feedback loop than having to push to Git or GCS, and since the Build is executing on a Kubernetes cluster (incl. Minikube), you've got the same underlying execution layer.
AFAIK Skaffold doesn't have a facility to persist the cluster's workspace after the build executes (or fails), so maybe it only solves half of the problem. I'm sure we could come up with something though, if it's a significant problem.
How is that different from reimplementing the whole logic? Skaffold would have to do that, right?
Perhaps I have an incomplete understanding of Skaffold's model, but I had assumed it would execute the build on the cluster (local or remote) that it uses the execute the rest of its workload.
I believe this is how CBI (another CRD for describing container builds) proposes to integrate with Skaffold. Please correct me if I'm wrong.
@ImJasonH You are right, Skaffold could launch a patched version of the build using kubectl apply
This would:
- Remove the need to push to git or GCS. (How do we pass the sources to the build then?)
- Could do the log-fu for us. \o/
- Could do the cleaning of the build
It's clearly missing the retrieval of the workspace. How would I do it right now? Can I access the workspace after the pod is complete/failed?
It's not possible today, but I could imagine some changes that would enable it. Maybe some option to specify the volume that backs /workspace
, instead of an emptyDir
like it is today? Then you could specify a persistent volume that you can inspect afterwards.
I think the Skaffold approach would still need to upload the local workspace to GCS (possibly incrementally!) to fetch it in the build. It looks like Skaffold requires this today anyway for, e.g., kaniko builds.
Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.\n If this issue is safe to close now please do so with /close.\n Send feedback to Knative Productivity Slack channel or knative/test-infra. /lifecycle stale