cloud-provider-gcp icon indicating copy to clipboard operation
cloud-provider-gcp copied to clipboard

32.2.3 image seems broken

Open YifeiZhuang opened this issue 9 months ago • 12 comments

The 32.2.3 image seems broken, the entry point changed. Presubmit test failure.


The cloud build builds the images using ko, and according to here it will always put the binary under ko-app

docker inspect gcr.io/k8s-staging-cloud-provider-gcp/cloud-controller-manager:v32.2.3 -f '{{.Config.Entrypoint}}'
[/ko-app/cloud-controller-manager]

from @mmamczur

YifeiZhuang avatar Mar 13 '25 18:03 YifeiZhuang

This issue is currently awaiting triage.

If the repository mantainers determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar Mar 13 '25 18:03 k8s-ci-robot

the issue is caused by the build being done by https://github.com/ko-build/ko https://github.com/kubernetes/cloud-provider-gcp/blob/master/tools/push-images#L45

that tool will put binary in that path and set the entrypoint acordingly.

The pod specifications assume the binary is in /cloud-controller-manager and set the command to that ignoring the entry point.

Images built before (for example v30.0.0) were build differently not using ko.

this affects everything that was auto built in cloud build so basically everything after v30.0.0

mmamczur avatar Mar 14 '25 10:03 mmamczur

/assign @cpanato

YifeiZhuang avatar Mar 18 '25 18:03 YifeiZhuang

@YifeiZhuang can you point to me where are the job definitions that uses /cloud-controller-manager?

cpanato avatar Mar 19 '25 14:03 cpanato

@cpanato https://github.com/kubernetes/kubernetes/blob/73f54b67b29d77601b0bd42ad8b4992925b9df47/cluster/gce/manifests/cloud-controller-manager.manifest#L25-L37

I prefer to remove the ko suffix, and not modify existing manifests

aojea avatar Mar 19 '25 15:03 aojea

checking that, thanks for the point

cpanato avatar Mar 19 '25 15:03 cpanato

@cpanato that is the manifest for the job that runs in k/k, the jobs that run from this repo has the manifest here (the kube-up scripts are copied in this repo from k/k)

https://github.com/search?q=repo%3Akubernetes%2Fcloud-provider-gcp%20cloud-controller-manager&type=code

aojea avatar Mar 19 '25 15:03 aojea

The ko build tool path support issue is still in debate and not going to happen soon: https://github.com/ko-build/ko/issues/944

It looks migrating to Make file is a standard now? This repo is still using bazel, so we may replace ko build in push image script with this command?

IMAGE_REGISTRY=example.com IMAGE_REPO=my-repo IMAGE_TAG=v1 bazel run //cmd/cloud-controller-manager:publish

thoughts? @cpanato

YifeiZhuang avatar Mar 27 '25 18:03 YifeiZhuang

@YifeiZhuang Sorry for the delay. Yes, that will not happen any time soon, and if we decide to still use Ko, we just need to update the job definition. To be honest, I don't know why that defines the entry point in the manifest.

It will be a project decision whether to stay with Bazel or get rid of it. We can also choose to build the image with Docker instead of Ko.

I think we should keep ko and update the jobs that use the wrong entry point.

the snippet above seems correct, did you build the image and run locally to check?

cpanato avatar Mar 28 '25 07:03 cpanato

@cpanato as the issue linked https://github.com/ko-build/ko/issues/944 explains

The problem is, file paths in the container are unfortunately, sometimes API.

so I do not think update all the manifests across the ecosystem is feasible.

Instead of bazel and ko we can just create a Dockerfile and build it with docker build , that is simpler and easier to maintain

aojea avatar Mar 28 '25 07:03 aojea

@aojea sgtm, should i move forward with that or wait for @YifeiZhuang to hear her thoughts?

cpanato avatar Mar 28 '25 07:03 cpanato

@cpanato you can more forward with https://github.com/kubernetes/cloud-provider-gcp/pull/825 , please feel free to take it, I just uploaded as a PoC, in this case there are no many alternatives and Dockerfile is clearly the one that crosses all checks

aojea avatar Mar 28 '25 08:03 aojea

@YifeiZhuang @aojea ok, cool

I am working to split the cloubbuild and have one only for ccm and have more control, doing some tests now, will open a PR soon

cpanato avatar Apr 01 '25 12:04 cpanato

So a few things I am trying to tackle:

[ ] Ensure that we test the Dockerfile build in our e2e (particularly the kops-simple scenario). Hopefully we start seeing the breakage in our e2e, preventing regression. [ ] Fix the docker file so that we restore the original entrypoint [ ] Add arm64 building and make the image multiarch [ ] Add some more test coverage to generally be able to catch simple problems (e.g. more in-repo tests)

justinsb avatar Apr 15 '25 16:04 justinsb

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jul 14 '25 17:07 k8s-triage-robot

/remove-lifecycle stale

cpanato avatar Jul 15 '25 07:07 cpanato

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Oct 13 '25 07:10 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Nov 12 '25 07:11 k8s-triage-robot