ControlPlane node is not ready in scalability tests when run on GCE
In scalability tests, the control-plane node is never initialized to be ready. We're usually not suffering from them as almost all our tests run 100+ nodes and we tollerate 1% of nodes not initialized correctly. But this is problematic for tests like: https://testgrid.k8s.io/sig-scalability-experiments#watchlist-off
Looking into kubelet logs, the reason seem to be:
May 11 09:09:13.886270 bootstrap-e2e-master kubelet[2782]: E0511 09:09:13.886233 2782 kubelet.go:2753] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"
FWIW - it seems to be related to some of our preset settings, as, e.g. https://testgrid.k8s.io/sig-scalability-node#node-containerd-throughput
don't suffer from it.
@kubernetes/sig-scalability @mborsz @Argh4k @p0lyn0mial - FYI
The only suspicious one that I see in our preset is this one:
- name: KUBE_GCE_PRIVATE_CLUSTER
value: "true"
containerd logs from master:
May 12 08:43:21.379251 bootstrap-e2e-master containerd[650]: time="2023-05-12T08:43:21.379201176Z" level=error msg="failed to load cni during init, please check CRI plugin status before setting up network for pods" error="cni config load failed: no network config found in /etc/cni/net.d: cni plugin not initialized: failed to load cni config"
on nodes we get cni config from template: NetworkPluginConfTemplate:/home/kubernetes/cni.template
on the master it is empty. In logs from master I can see that setup-containerd is called from configure-helper and it should set the template path. My guess would be that https://github.com/kubernetes/kubernetes/blob/master/cluster/gce/gci/configure-helper.sh#L3181 is executed, but this should not be the case.
I have sshed on to the master and it looks like all configuration files regarding cni are in place. Kubectl describe node on master:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal RegisteredNode 21m node-controller Node bootstrap-e2e-master event: Registered Node bootstrap-e2e-master in Controller
Normal CIDRAssignmentFailed 26s (x56 over 21m) cidrAllocator Node bootstrap-e2e-master status is now: CIDRAssignmentFailed
Kube controller manager logs:
E0512 13:12:32.119653 11 cloud_cidr_allocator.go:315] "Failed to update the node PodCIDR after multiple attempts" err="failed to patch node CIDR: Node \"bootstrap-e2e-master\" is invalid: spec.podCIDRs: Invalid value: []string{\"10.64.0.0/24\", \"10.40.0.2/32\"}: may specify no more than one CIDR for each IP family" node="bootstrap-e2e-master" cidrStrings=["10.64.0.0/24","10.40.0.2/32"]
E0512 13:12:32.119671 11 cloud_cidr_allocator.go:178] "Error updating CIDR" err="failed to patch node CIDR: Node \"bootstrap-e2e-master\" is invalid: spec.podCIDRs: Invalid value: []string{\"10.64.0.0/24\", \"10.40.0.2/32\"}: may specify no more than one CIDR for each IP family" workItem="bootstrap-e2e-master"
E0512 13:12:32.119682 11 cloud_cidr_allocator.go:187] "Exceeded retry count, dropping from queue" workItem="bootstrap-e2e-master"
I0512 13:12:32.119755 11 event.go:307] "Event occurred" object="bootstrap-e2e-master" fieldPath="" kind="Node" apiVersion="v1" type="Normal" reason="CIDRAssignmentFailed" message="Node bootstrap-e2e-master status is now: CIDRAssignmentFailed"
Wojtek's gut feeling was right.
@p0lyn0mial if you want to we can create pr to add:
- --env=KUBE_GCE_PRIVATE_CLUSTER=false
to the tests and they should work just fine. In the meantime I will try to understand why KUBE_GCE_PRIVATE_CLUSTER makes master node to get two CIDRs.
Does it have cloud NAT enabled?
If not the private network may be having issues fetching eg from registry.k8s.io which isn't a first-party GCP service unlike GCR
cc @aojea re: GCE cidr allocation :-)
E0512 13:12:32.119671 11 cloud_cidr_allocator.go:178] "Error updating CIDR" err="failed to patch node CIDR: Node "bootstrap-e2e-master" is invalid: spec.podCIDRs: Invalid value: []string{"10.64.0.0/24", "10.40.0.2/32"}: may specify no more than one CIDR for each IP family" workItem="bootstrap-e2e-master"
https://github.com/kubernetes/test-infra/issues/29500#issuecomment-1545732863 @basantsa1989 we have a bug in the allocator https://github.com/kubernetes/kubernetes/commit/a013c6a2db54c59b78de974b181586723e088246
If we receive multiple cidrs before patching for dual-stack we should validate that those are dual stack
We have to fix it in k/k and in the cloud-provider-gcp https://github.com/kubernetes/cloud-provider-gcp/blob/67d1fd9f7255629fac3adfc956d0c8b2ac5f50f0/pkg/controller/nodeipam/ipam/cloud_cidr_allocator.go#L341-L344
FYI: https://github.com/kubernetes/kubernetes/blob/master/cluster/gce/util.sh#L3008 this is the place where we add master internal ip as a second alias if we are using KUBE_GCE_PRIVATE_CLUSTER
Then this second ip is picked by kcm (https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/legacy-cloud-providers/gce/gce_instances.go#L496) and allocator thinks we have dual stack and tries to apply both of them which fails, because we can have at most one ipv4 cidr per node.
Kube controller manager logs:
E0512 13:12:32.119653 11 cloud_cidr_allocator.go:315] "Failed to update the node PodCIDR after multiple attempts" err="failed to patch node CIDR: Node \"bootstrap-e2e-master\" is invalid: spec.podCIDRs: Invalid value: []string{\"10.64.0.0/24\", \"10.40.0.2/32\"}: may specify no more than one CIDR for each IP family" node="bootstrap-e2e-master" cidrStrings=["10.64.0.0/24","10.40.0.2/32"] E0512 13:12:32.119671 11 cloud_cidr_allocator.go:178] "Error updating CIDR" err="failed to patch node CIDR: Node \"bootstrap-e2e-master\" is invalid: spec.podCIDRs: Invalid value: []string{\"10.64.0.0/24\", \"10.40.0.2/32\"}: may specify no more than one CIDR for each IP family" workItem="bootstrap-e2e-master" E0512 13:12:32.119682 11 cloud_cidr_allocator.go:187] "Exceeded retry count, dropping from queue" workItem="bootstrap-e2e-master" I0512 13:12:32.119755 11 event.go:307] "Event occurred" object="bootstrap-e2e-master" fieldPath="" kind="Node" apiVersion="v1" type="Normal" reason="CIDRAssignmentFailed" message="Node bootstrap-e2e-master status is now: CIDRAssignmentFailed"
@Argh4k do you have the entire logs?
@aojea https://gcsweb.k8s.io/gcs/sig-scalability-logs/ci-kubernetes-e2e-gci-gce-scalability-watch-list-off/1658029086385115136/bootstrap-e2e-master/ has all the logs from the master
/sig network
based on @basantsa1989 comment https://github.com/kubernetes/kubernetes/pull/118043#issuecomment-1553661135 the allocator is working as expected and the problem is that this is not supported
https://github.com/kubernetes/kubernetes/blob/8db4d63245a89a78d76ff5916c37439805b11e5f/cluster/gce/util.sh#L3008
can we configure the cluster in a different way we don't pass two cidrs?
I hope we can, unfortunately I haven't had much time to look into this and other work was unblocked by running tests in a small public cluster.
@Argh4k Hey, a friendly remainder to work on this issue :)
It looks like having a private cluster would increase egress traffic. Having a higher egress bandwidth would allow us to generate a larger test traffic. Currently, we had to reduce the test traffic because it seems that latency is being throttled due to the limited egress bandwidth.
See https://github.com/kubernetes/perf-tests/issues/2287
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
I think that this issue still hasn't been resolved
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
I think that this issue still hasn't been resolved
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
@aojea thoughts on this?
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Reopen this issue with
/reopen - Mark this issue as fresh with
/remove-lifecycle rotten - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied- After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied- After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closedYou can:
- Reopen this issue with
/reopen- Mark this issue as fresh with
/remove-lifecycle rotten- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.