gardener icon indicating copy to clipboard operation
gardener copied to clipboard

skaffold: Hooks are not executed if image is already build

Open ialidzhikov opened this issue 1 year ago • 9 comments

How to categorize this issue?

/area dev-productivity /kind bug

What happened:

make operator-up can fail with:

running [kustomize build /go/src/github.com/gardener/gardener/example/provider-local/garden/operator]
 - stdout: ""
 - stderr: "Error: trouble configuring builtin PatchTransformer with config: `\npath: patch-imagevector-overwrite.yaml\n`: failed to get the patch file from path(patch-imagevector-overwrite.yaml): evalsymlink failure on '/go/src/github.com/gardener/gardener/example/provider-local/garden/operator/patch-imagevector-overwrite.yaml' : lstat /go/src/github.com/gardener/gardener/example/provider-local/garden/operator/patch-imagevector-overwrite.yaml: no such file or directory\n"
 - cause: exit status 1
make: *** [operator-up] Error 1

https://github.com/gardener/gardener/pull/10107 switched the generation of example/provider-local/garden/operator/patch-imagevector-overwrite.yaml to happen via hook and not via buildCommand.

However hooks are not executed if you locally have the image already built and cached. Hence, the patch-imagevector-overwrite.yaml file is not generated and it fails with the above error.

What you expected to happen: patch-imagevector-overwrite.yaml to be generated even if the image is already built and cached.

How to reproduce it (as minimally and precisely as possible):

  1. Run make kind-up and make gardener-up. This will build and cache the local-skaffold/machine-controller-manager-provider-local image.
  2. Delete example/provider-local/garden/operator/patch-imagevector-overwrite.yaml if you already have it.
  3. Run make kind-operator-up and make operator-up
  4. Make sure make operator-up fails with

Anything else we need to know?:

Environment:

  • Gardener version: 8b808dad44e233c0f9d648e154ac310d3141a761
  • Kubernetes version (use kubectl version):
  • Cloud provider or hardware configuration:
  • Others:

ialidzhikov avatar Jul 26 '24 14:07 ialidzhikov

cc @rrhubenov

ialidzhikov avatar Jul 26 '24 14:07 ialidzhikov

Maybe this issue exists even with buildCommand, I haven't checked whether it got introduced with the switch to the hooks.

ialidzhikov avatar Jul 26 '24 14:07 ialidzhikov

Hi Ismo, this can most likely be fixed by adding --cache-artifacts=false to every target in the Makefile that runs skaffold. Eg. https://github.com/gardener/gardener/blob/c97814b56f2bb3358a6cadba507b7d4e88e3dcc3/Makefile#L346-L347

Thanks for catching the issue, I'll reproduce it try the fix and open a PR if it works.

rrhubenov avatar Jul 26 '24 14:07 rrhubenov

Wouldn't this lead to Skaffold always rebuilding all images, thus increased turnaround times?

rfranzke avatar Jul 26 '24 14:07 rfranzke

@rfranzke You're most likely right. I'll take a look at some sort of conditional caching, but as far as I've seen, skaffold does not have such a feature. I'll take a look at possible solutions, check if this issue was happening with the previous version that used buildCommand and report back

rrhubenov avatar Jul 26 '24 14:07 rrhubenov

I vaguely recall hitting it even with buildCommand, most likely I was wrong associating it the switch to the hooks. It is not super likely to hit this issue. @andrerun hit it because https://github.com/gardener/gardener/issues/10006 introduced yesterday a new image with such hook and file + .gitignore for that generated file. And Andrey already had the corresponding image built by make gardener-up.

So, @rrhubenov , don't feel obligated to look into this issue, as I said most likely it was happening before your change.

ialidzhikov avatar Jul 26 '24 14:07 ialidzhikov

Without verifying I'd also assume that the issue had been happening even with buildCommand, since it does the same thing as the hook, namely just creating the patch-imagevector-overwrite.yaml file.

I'll take a look at this when I have some free time so we can at least mark the issue as documented and well-known.

rrhubenov avatar Jul 26 '24 14:07 rrhubenov

FYI: https://github.com/gardener/gardener/pull/10161 introduced more such build hooks.

rfranzke avatar Jul 26 '24 14:07 rfranzke

There are some chances for weird errors here 😅 Assuming you work on multiple PRs which update gardenlet simultaneously. When you have built all versions and start switching back and forth between the branches each make gardener-up will change the image, but not the files changed by the hooks anymore 🤔

oliver-goetz avatar Jul 26 '24 15:07 oliver-goetz

The Gardener project currently lacks enough active contributors to adequately respond to all issues. This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Mark this issue as rotten with /lifecycle rotten
  • Close this issue with /close

/lifecycle stale

gardener-ci-robot avatar Oct 24 '24 15:10 gardener-ci-robot

cc @marc1404 Sounds a bit similar to https://github.com/gardener/gardener/pull/10710. The make operator-dev make target introduced by you would at least write the overwrite files in question here again, as the image tag changes with each build (or so I would assume).

LucaBernstein avatar Oct 25 '24 10:10 LucaBernstein

Sounds a bit similar to #10710. The make operator-dev make target introduced by you would at least write the overwrite files in question here again, as the image tag changes with each build (or so I would assume).

Indeed it does! I've ran into a similar issue while playing around locally until I hit the case that the patch-imagevector-override.yaml was not recreated since the images were already cached.

Skaffold documents that the only way to force build hooks to run is by passing --cache-artifacts=false (or setting SKAFFOLD_CACHE_ARTIFACTS=false).

before-build and after-build Build hooks are executed before and after each artifact is built. If an artifact is not built, such as happens when the image was found in the Skaffold image cache, then the build hooks will not be executed. To force the build hooks, run Skaffold with --cache-artifacts=false option.

– https://skaffold.dev/docs/lifecycle-hooks/#before-build-and-after-build

Of course, outright disabling Skaffold's caching feature would worsen developer productivity.

One way I can think of to remedy the issue described by @ialidzhikov to have a small script that checks whether patch-imagevector-overwrite.yaml exists (optionally also checking that the content contains the images). In the Makefile we could then determine whether to disable Skaffold's cache if necessary (forcing the file to be regenerated).

However, it would not fix the potential issue that @oliver-goetz describes that I also ran into while frantically trying different changes until I realized that patch-imagevector-overwrite.yaml was not being updated anymore.

marc1404 avatar Oct 25 '24 15:10 marc1404

@marc1404 , I am not sure if https://github.com/gardener/gardener/pull/10958 is covering all the possible cases.

The issue I am hitting now with the registry-cache extension is the following (I guess it should be possible to reproduce the same for provider-local):

  1. I am on branch foo. The local-setup/patch-controllerdeployment-image.yaml file exists with version v0.13.0-dev-1234
  2. I switch to branch bar.
  3. I run skaffold
  4. The images are cached ones. Hence, hooks are not executed and it does not update the version in local-setup/patch-controllerdeployment-image.yaml. skaffold completes without updating the version in the ControllerDeployment. The version in the ControllerDeployment still points to the image from the old branch foo (v0.13.0-dev-1234) while I ran skaffold on the new branch bar.

https://github.com/gardener/gardener/pull/10958 seems to cover only the case when the patch files do not exist.

ialidzhikov avatar Dec 12 '24 14:12 ialidzhikov

@ialidzhikov You're correct, please see the limitation section of #10958 at the end:

Limitation: ... Unfortunately, this approach fails to fix https://github.com/gardener/gardener/issues/10209#issuecomment-2252985074. This would require a more complex fix but I think it's acceptable given that it's a bit of an edge case and even somewhat the responsibility of developers to be aware of potential caching issues when changing branches between Skaffold builds.

(Disregard what I said there about it being an edge case) 🙃

I agree that more needs to be done to find a thorough solution. Feel free to reopen this issue however you see fit. Since you describe the case of switching branches and images being cached: Perhaps we could check the Git commit reference that the patch file(s) contain and if it doesn't match against the current branch Skaffold's cache should be skipped as well.

marc1404 avatar Dec 12 '24 15:12 marc1404

For the registry-cache extension, we resolved the issue for us (ref https://github.com/gardener/gardener-extension-registry-cache/issues/305) by eliminating the skaffold hook (ref https://github.com/gardener/gardener-extension-registry-cache/pull/357).

ialidzhikov avatar Feb 20 '25 07:02 ialidzhikov

/assign @ialidzhikov @dimitar-kostadinov

ialidzhikov avatar Feb 20 '25 15:02 ialidzhikov

@ialidzhikov: GitHub didn't allow me to assign the following users: dimitar-kostadinov.

Note that only gardener members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. For more information please see the contributor guide

In response to this:

/assign @ialidzhikov @dimitar-kostadinov

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

gardener-prow[bot] avatar Feb 20 '25 15:02 gardener-prow[bot]