cloud-provider-gcp
cloud-provider-gcp copied to clipboard
CAPG: Upstream CCM manifest doesn't work
Tried deploying CCM in a CAPG cluster and used the provided CCM manifest from (https://github.com/kubernetes/cloud-provider-gcp/blob/master/deploy/packages/default/manifest.yaml).
The CCM pod is stuck in CrashLoopBack
with this error:
unable to load configmap based request-header-client-ca-file: Get "https://127.0.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication": dial tcp 127.0.0.1:443: connect: connection refused
This issue is currently awaiting triage.
If the repository mantainers determine this is a relevant issue, they will accept it by applying the triage/accepted
label and provide further guidance.
The triage/accepted
label can be added by org members by writing /triage accepted
in a comment.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Please use:
command: ['/usr/local/bin/cloud-controller-manager']
args:
- --cloud-provider=gce
- --leader-elect=true
- --use-service-account-credentials
and remove the env.
/kind support
Hi @mcbenjemaa , Thanks for the help. CCM pod is up now with these args
- args:
- --cloud-provider=gce
- --leader-elect=true
- --use-service-account-credentials
- --allocate-node-cidrs=true
- --cluster-cidr=192.168.0.0/16
- --configure-cloud-routes=false
One more doubt, I see the cloud-controller-manager image being used is k8scloudprovidergcp/cloud-controller-manager:latest
. How can I use k8s version specific images for ccm?
You may have to build the image while the release process is being revampled, there are instructions in the README.
The :latest
tag is aimed at CI / testing of the project itself I think.
/retitle CAPG: Upstream CCM manifest doesn't work
I don't think the manifest is necessarily meant to work with CAPG, I would expect CAPG to handle deploying everything?
Otherwise this may be in scope for #686
Self deployed CCM: i got this error:
message="Error syncing load balancer: failed to ensure load balancer: instance not found"
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
Something similar here, i'm trying to deploy the Cloud Controller Manager (CCM) and I'm encountering the following error:
I0823 08:10:42.838284 1 node_controller.go:391] Initializing node minplus0-md-2-vbvmr-856l7 with cloud provider
I0823 08:10:42.920926 1 gen.go:15649] GCEInstances.Get(context.Background.WithDeadline(2024-08-23 09:10:42.83965981 +0000 UTC m=+3629.567729051 [59m59.918720336s]), Key{"minplus0-md-2-vbvmr-856l7", zone: "europe-west4-b"}) = <nil>, googleapi: Error 404: The resource 'projects/clusterapi-369611/zones/europe-west4-b/instances/minplus0-md-2-vbvmr-856l7' was not found, notFound
E0823 08:10:42.921062 1 node_controller.go:213] error syncing 'minplus0-md-2-vbvmr-856l7': failed to get instance metadata for node minplus0-md-2-vbvmr-856l7: failed to get instance ID from cloud provider: instance not found, requeuing
I don't understand why CCM is adding the label zone as:
I0823 08:10:41.974944 1 node_controller.go:493] Adding node label from cloud provider: beta.kubernetes.io/instance-type=n2-standard-2
I0823 08:10:41.974950 1 node_controller.go:494] Adding node label from cloud provider: node.kubernetes.io/instance-type=n2-standard-2
I0823 08:10:41.974954 1 node_controller.go:505] Adding node label from cloud provider: failure-domain.beta.kubernetes.io/zone=europe-west4-b
I0823 08:10:41.974958 1 node_controller.go:506] Adding node label from cloud provider: topology.kubernetes.io/zone=europe-west4-b
I0823 08:10:41.974963 1 node_controller.go:516] Adding node label from cloud provider: failure-domain.beta.kubernetes.io/region=europe-west4
I0823 08:10:41.974968 1 node_controller.go:517] Adding node label from cloud provider: topology.kubernetes.io/region=europe-west4
The correct zone should be gce://clusterapi-369611/europe-west4-c/minplus0-md-2-vbvmr-856l7
.
This is how I'm deploying CCM:
- name: cloud-controller-manager
image: k8scloudprovidergcp/cloud-controller-manager:latest
imagePullPolicy: IfNotPresent
# ko puts it somewhere else... command: ['/usr/local/bin/cloud-controller-manager']
command: ['/usr/local/bin/cloud-controller-manager']
args:
- --cloud-provider=gce # Add your own cloud provider here!
- --leader-elect=true
- --use-service-account-credentials
# these flags will vary for every cloud provider
- --allocate-node-cidrs=true
- --configure-cloud-routes=true
- --cluster-cidr=192.168.0.0/16
- --v=4
livenessProbe:
failureThreshold: 3
httpGet:
host: 127.0.0.1
path: /healthz
port: 10258
scheme: HTTPS
initialDelaySeconds: 15
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 15
resources:
requests:
cpu: "200m"
volumeMounts:
- mountPath: /etc/kubernetes/cloud.config
name: cloudconfig
readOnly: true
hostNetwork: true
priorityClassName: system-cluster-critical
volumes:
- hostPath:
path: /etc/kubernetes/cloud.config
type: ""
name: cloudconfig
The correct zone should be
gce://clusterapi-369611/europe-west4-c/minplus0-md-2-vbvmr-856l7
.
what do you mean by correct zone there?
the instance url is https://www.googleapis.com/compute/v1/projects/{PROJECT}/zones/{ZONE}/instances/{VM_INSTANCE}
that is the providerId, isn't it?
The issue is that the GCEInstances.Get
function constructs the provider ID with the wrong zone. It assumes the zone must match where the master CCM is deployed (in this case, europe-west4-b
), instead of the correct one, which is europe-west4-c
. That's why the CCM couldn't find the instance.
Is there any way to make the CCM check every single zone? Maybe a multizone option or something similar?
Solved!
args:
- --cloud-provider=gce # Add your own cloud provider here!
- --leader-elect=true
- --use-service-account-credentials
# these flags will vary for every cloud provider
- --allocate-node-cidrs=true
- --cluster-cidr=192.168.0.0/16
- --v=4
- --cloud-config=/etc/kubernetes/gce.conf
livenessProbe:
failureThreshold: 3
httpGet:
host: 127.0.0.1
path: /healthz
port: 10258
scheme: HTTPS
initialDelaySeconds: 15
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 15
resources:
requests:
cpu: "200m"
volumeMounts:
- mountPath: /etc/kubernetes/gce.conf
name: cloudconfig
readOnly: true
hostNetwork: true
priorityClassName: system-cluster-critical
volumes:
- hostPath:
path: /etc/kubernetes/gce.conf
type: FileOrCreate
name: cloudconfig
where gce.conf
:
[Global]
multizone=true
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten