skaffold: Hooks are not executed if image is already build
How to categorize this issue?
/area dev-productivity /kind bug
What happened:
make operator-up can fail with:
running [kustomize build /go/src/github.com/gardener/gardener/example/provider-local/garden/operator]
- stdout: ""
- stderr: "Error: trouble configuring builtin PatchTransformer with config: `\npath: patch-imagevector-overwrite.yaml\n`: failed to get the patch file from path(patch-imagevector-overwrite.yaml): evalsymlink failure on '/go/src/github.com/gardener/gardener/example/provider-local/garden/operator/patch-imagevector-overwrite.yaml' : lstat /go/src/github.com/gardener/gardener/example/provider-local/garden/operator/patch-imagevector-overwrite.yaml: no such file or directory\n"
- cause: exit status 1
make: *** [operator-up] Error 1
https://github.com/gardener/gardener/pull/10107 switched the generation of example/provider-local/garden/operator/patch-imagevector-overwrite.yaml to happen via hook and not via buildCommand.
However hooks are not executed if you locally have the image already built and cached. Hence, the patch-imagevector-overwrite.yaml file is not generated and it fails with the above error.
What you expected to happen:
patch-imagevector-overwrite.yaml to be generated even if the image is already built and cached.
How to reproduce it (as minimally and precisely as possible):
- Run
make kind-upandmake gardener-up. This will build and cache thelocal-skaffold/machine-controller-manager-provider-localimage. - Delete
example/provider-local/garden/operator/patch-imagevector-overwrite.yamlif you already have it. - Run
make kind-operator-upandmake operator-up - Make sure
make operator-upfails with
Anything else we need to know?:
Environment:
- Gardener version: 8b808dad44e233c0f9d648e154ac310d3141a761
- Kubernetes version (use
kubectl version): - Cloud provider or hardware configuration:
- Others:
cc @rrhubenov
Maybe this issue exists even with buildCommand, I haven't checked whether it got introduced with the switch to the hooks.
Hi Ismo, this can most likely be fixed by adding --cache-artifacts=false to every target in the Makefile that runs skaffold.
Eg. https://github.com/gardener/gardener/blob/c97814b56f2bb3358a6cadba507b7d4e88e3dcc3/Makefile#L346-L347
Thanks for catching the issue, I'll reproduce it try the fix and open a PR if it works.
Wouldn't this lead to Skaffold always rebuilding all images, thus increased turnaround times?
@rfranzke You're most likely right. I'll take a look at some sort of conditional caching, but as far as I've seen, skaffold does not have such a feature.
I'll take a look at possible solutions, check if this issue was happening with the previous version that used buildCommand and report back
I vaguely recall hitting it even with buildCommand, most likely I was wrong associating it the switch to the hooks.
It is not super likely to hit this issue. @andrerun hit it because https://github.com/gardener/gardener/issues/10006 introduced yesterday a new image with such hook and file + .gitignore for that generated file. And Andrey already had the corresponding image built by make gardener-up.
So, @rrhubenov , don't feel obligated to look into this issue, as I said most likely it was happening before your change.
Without verifying I'd also assume that the issue had been happening even with buildCommand, since it does the same thing as the hook, namely just creating the patch-imagevector-overwrite.yaml file.
I'll take a look at this when I have some free time so we can at least mark the issue as documented and well-known.
FYI: https://github.com/gardener/gardener/pull/10161 introduced more such build hooks.
There are some chances for weird errors here 😅
Assuming you work on multiple PRs which update gardenlet simultaneously. When you have built all versions and start switching back and forth between the branches each make gardener-up will change the image, but not the files changed by the hooks anymore 🤔
The Gardener project currently lacks enough active contributors to adequately respond to all issues. This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Mark this issue as rotten with
/lifecycle rotten - Close this issue with
/close
/lifecycle stale
cc @marc1404
Sounds a bit similar to https://github.com/gardener/gardener/pull/10710. The make operator-dev make target introduced by you would at least write the overwrite files in question here again, as the image tag changes with each build (or so I would assume).
Sounds a bit similar to #10710. The
make operator-devmake target introduced by you would at least write the overwrite files in question here again, as the image tag changes with each build (or so I would assume).
Indeed it does! I've ran into a similar issue while playing around locally until I hit the case that the patch-imagevector-override.yaml was not recreated since the images were already cached.
Skaffold documents that the only way to force build hooks to run is by passing --cache-artifacts=false (or setting SKAFFOLD_CACHE_ARTIFACTS=false).
before-build and after-build Build hooks are executed before and after each artifact is built. If an artifact is not built, such as happens when the image was found in the Skaffold image cache, then the build hooks will not be executed. To force the build hooks, run Skaffold with --cache-artifacts=false option.
– https://skaffold.dev/docs/lifecycle-hooks/#before-build-and-after-build
Of course, outright disabling Skaffold's caching feature would worsen developer productivity.
One way I can think of to remedy the issue described by @ialidzhikov to have a small script that checks whether patch-imagevector-overwrite.yaml exists (optionally also checking that the content contains the images). In the Makefile we could then determine whether to disable Skaffold's cache if necessary (forcing the file to be regenerated).
However, it would not fix the potential issue that @oliver-goetz describes that I also ran into while frantically trying different changes until I realized that patch-imagevector-overwrite.yaml was not being updated anymore.
@marc1404 , I am not sure if https://github.com/gardener/gardener/pull/10958 is covering all the possible cases.
The issue I am hitting now with the registry-cache extension is the following (I guess it should be possible to reproduce the same for provider-local):
- I am on branch
foo. Thelocal-setup/patch-controllerdeployment-image.yamlfile exists with versionv0.13.0-dev-1234 - I switch to branch
bar. - I run skaffold
- The images are cached ones. Hence, hooks are not executed and it does not update the version in
local-setup/patch-controllerdeployment-image.yaml. skaffold completes without updating the version in the ControllerDeployment. The version in the ControllerDeployment still points to the image from the old branchfoo(v0.13.0-dev-1234) while I ran skaffold on the new branchbar.
https://github.com/gardener/gardener/pull/10958 seems to cover only the case when the patch files do not exist.
@ialidzhikov You're correct, please see the limitation section of #10958 at the end:
Limitation: ... Unfortunately, this approach fails to fix https://github.com/gardener/gardener/issues/10209#issuecomment-2252985074. This would require a more complex fix but I think it's acceptable given that it's a bit of an edge case and even somewhat the responsibility of developers to be aware of potential caching issues when changing branches between Skaffold builds.
☝ (Disregard what I said there about it being an edge case) 🙃
I agree that more needs to be done to find a thorough solution. Feel free to reopen this issue however you see fit. Since you describe the case of switching branches and images being cached: Perhaps we could check the Git commit reference that the patch file(s) contain and if it doesn't match against the current branch Skaffold's cache should be skipped as well.
For the registry-cache extension, we resolved the issue for us (ref https://github.com/gardener/gardener-extension-registry-cache/issues/305) by eliminating the skaffold hook (ref https://github.com/gardener/gardener-extension-registry-cache/pull/357).
/assign @ialidzhikov @dimitar-kostadinov
@ialidzhikov: GitHub didn't allow me to assign the following users: dimitar-kostadinov.
Note that only gardener members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. For more information please see the contributor guide
In response to this:
/assign @ialidzhikov @dimitar-kostadinov
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.