cloud-provider-gcp Error: failed to get instance metadata

In the e2e Conformace CI Artifacts tests for CAPG, we are seeing flaky issues in bootstrapping the workload clusters using the CCM GCP. See Testgrid: https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-provider-gcp#capg-conformance-main-ci-artifacts

It looks like the tests pass when running in the GCP projects, but for others it fails. In the logs of CCM we can see the following errors

2023-11-16T13:52:57.701624403Z stderr F I1116 13:52:57.701427       1 node_controller.go:431] Initializing node capg-conf-lhoc9s-md-0-szjmb with cloud provider
2023-11-16T13:52:57.77723051Z stderr F I1116 13:52:57.776860       1 gen.go:17904] GCEInstances.Get(context.Background.WithDeadline(2023-11-16 14:52:57.702153712 +0000 UTC m=+3643.514258378 [59m59.925284681s]), Key{"capg-conf-lhoc9s-md-0-szjmb", zone: "us-east4-c"}) = <nil>, googleapi: Error 404: The resource 'projects/k8s-infra-e2e-boskos-088/zones/us-east4-c/instances/capg-conf-lhoc9s-md-0-szjmb' was not found, notFound
2023-11-16T13:52:57.777357462Z stderr F E1116 13:52:57.777125       1 node_controller.go:240] error syncing 'capg-conf-lhoc9s-md-0-szjmb': failed to get instance metadata for node capg-conf-lhoc9s-md-0-szjmb: failed to get instance ID from cloud provider: instance not found, requeuing

Seems some missing permission in the project, but also not 100% sure

full logs: logs.log

Tracking Issue: https://github.com/kubernetes/kubernetes/issues/120481

cc @aojea

Dec 03 '23 17:12 cpanato

This issue is currently awaiting triage.

If the repository mantainers determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Dec 03 '23 17:12 k8s-ci-robot

Looks like this is related https://github.com/kubernetes/kubernetes/pull/120615, we need to revendor to get that change /cc @sdmodi

Dec 05 '23 22:12 aojea

Error 404: The resource 'projects/k8s-infra-e2e-boskos-088/zones/us-east4-c/instances/capg-conf-lhoc9s-md-0-szjmb' was not found, notFound

it is a notFound error, no? it should ne be related to permissions

Dec 05 '23 22:12 aojea

so we need some documentation regarding which permissions in GCP we need to set

and then make sure we have or set those permissions during the tests in prow/boskos

Dec 06 '23 18:12 cpanato

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Mar 05 '24 18:03 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Apr 04 '24 19:04 k8s-triage-robot

/remove-lifecycle rotten

Apr 05 '24 07:04 cpanato

@cpanato can you elaborate a little bit on what needs documenting? Starting to plan out with @shannonxtreme what we need to dig up the technical details for and then write good docs for 😅 xref #686

May 07 '24 21:05 BenTheElder

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Aug 05 '24 21:08 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Sep 04 '24 21:09 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Oct 04 '24 22:10 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Oct 04 '24 22:10 k8s-ci-robot