Error: failed to get instance metadata
In the e2e Conformace CI Artifacts tests for CAPG, we are seeing flaky issues in bootstrapping the workload clusters using the CCM GCP. See Testgrid: https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-provider-gcp#capg-conformance-main-ci-artifacts
It looks like the tests pass when running in the GCP projects, but for others it fails. In the logs of CCM we can see the following errors
2023-11-16T13:52:57.701624403Z stderr F I1116 13:52:57.701427 1 node_controller.go:431] Initializing node capg-conf-lhoc9s-md-0-szjmb with cloud provider
2023-11-16T13:52:57.77723051Z stderr F I1116 13:52:57.776860 1 gen.go:17904] GCEInstances.Get(context.Background.WithDeadline(2023-11-16 14:52:57.702153712 +0000 UTC m=+3643.514258378 [59m59.925284681s]), Key{"capg-conf-lhoc9s-md-0-szjmb", zone: "us-east4-c"}) = <nil>, googleapi: Error 404: The resource 'projects/k8s-infra-e2e-boskos-088/zones/us-east4-c/instances/capg-conf-lhoc9s-md-0-szjmb' was not found, notFound
2023-11-16T13:52:57.777357462Z stderr F E1116 13:52:57.777125 1 node_controller.go:240] error syncing 'capg-conf-lhoc9s-md-0-szjmb': failed to get instance metadata for node capg-conf-lhoc9s-md-0-szjmb: failed to get instance ID from cloud provider: instance not found, requeuing
Seems some missing permission in the project, but also not 100% sure
full logs: logs.log
Tracking Issue: https://github.com/kubernetes/kubernetes/issues/120481
cc @aojea
This issue is currently awaiting triage.
If the repository mantainers determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.
The triage/accepted label can be added by org members by writing /triage accepted in a comment.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Looks like this is related https://github.com/kubernetes/kubernetes/pull/120615, we need to revendor to get that change /cc @sdmodi
Error 404: The resource 'projects/k8s-infra-e2e-boskos-088/zones/us-east4-c/instances/capg-conf-lhoc9s-md-0-szjmb' was not found, notFound
it is a notFound error, no? it should ne be related to permissions
so we need some documentation regarding which permissions in GCP we need to set
and then make sure we have or set those permissions during the tests in prow/boskos
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
/remove-lifecycle rotten
@cpanato can you elaborate a little bit on what needs documenting? Starting to plan out with @shannonxtreme what we need to dig up the technical details for and then write good docs for 😅 xref #686
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Reopen this issue with
/reopen - Mark this issue as fresh with
/remove-lifecycle rotten - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied- After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied- After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closedYou can:
- Reopen this issue with
/reopen- Mark this issue as fresh with
/remove-lifecycle rotten- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.