Couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
@caseydavenport Your request was to raise a separate issue.
I installed the latest version of calico using this helm chart. The kube-apiserver-kmaster1 returns the following error in the logs: v3.projectcalico.org failed with: failing or missing response from https://**:443/apis/projectcalico.org/v3.
Also after each random kubectl command it returns errors about these CRDs.
E0406 15:33:42.335160 55793 memcache.go:106] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
NAME STATUS ROLES AGE VERSION
kmaster1 Ready control-plane 19d v1.26.3
kworker1 Ready <none> 18d v1.26.3
kworker2 Ready <none> 18d v1.26.3
These CRDs are automitcally installed using the helm chart stated above.
--> k api-resources | grep calico
E0406 15:34:26.465805 55853 memcache.go:255] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
E0406 15:34:26.481896 55853 memcache.go:106] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
bgpconfigurations crd.projectcalico.org/v1 false BGPConfiguration
bgppeers crd.projectcalico.org/v1 false BGPPeer
blockaffinities crd.projectcalico.org/v1 false BlockAffinity
caliconodestatuses crd.projectcalico.org/v1 false CalicoNodeStatus
clusterinformations crd.projectcalico.org/v1 false ClusterInformation
felixconfigurations crd.projectcalico.org/v1 false FelixConfiguration
globalnetworkpolicies crd.projectcalico.org/v1 false GlobalNetworkPolicy
globalnetworksets crd.projectcalico.org/v1 false GlobalNetworkSet
hostendpoints crd.projectcalico.org/v1 false HostEndpoint
ipamblocks crd.projectcalico.org/v1 false IPAMBlock
ipamconfigs crd.projectcalico.org/v1 false IPAMConfig
ipamhandles crd.projectcalico.org/v1 false IPAMHandle
ippools crd.projectcalico.org/v1 false IPPool
ipreservations crd.projectcalico.org/v1 false IPReservation
kubecontrollersconfigurations crd.projectcalico.org/v1 false KubeControllersConfiguration
networkpolicies crd.projectcalico.org/v1 true NetworkPolicy
networksets crd.projectcalico.org/v1 true NetworkSet
error: unable to retrieve the complete list of server APIs: projectcalico.org/v3: the server is currently unable to handle the request
Do I understand it correctly that these crd.projectcalico.org/v1 CRDs are still needed - so not deleting them - and I need to manually install the v3 CRDs? If so, where can I download these v3 CRDs as I can't find it
I believe chet-tuttle-3 is facing some similar issues
Do I understand it correctly that these crd.projectcalico.org/v1 CRDs are still needed - so not deleting them - and I need to manually install the v3 CRDs? If so, where can I download these v3 CRDs as I can't find it
The v1 resources are CRDs and should be present - definitely don't delete those.
The v3 resources are not CRDs - they are implemented by the calico-apiserver pod in the calico-apiserver namespace.
the server is currently unable to handle the request
This suggests a problem with the Calico API server, or a problem with the kube-apiserver being unable to communicate with the Calico API server. I'd:
kubectl describe apiserviceto see if there's any breadcrumbs to follow there.kubectl logs --tail=-1 -n calico-apiserver -l k8s-app=calico-apiserverto get the full API server logs.kubectl describe tigerastatus apiserverfor potentially further breadcrumbs.
The only apiservice that has a status of false is v3.projectcalico.org that has the following error message: failing or missing response from https://***:443/apis/projectcalico.org/v3: Get "https://***:443/apis/projectcalico.org/v3": context deadline exceeded
➜ ~ kubectl describe apiservice v3.projectcalico.org
Name: v3.projectcalico.org
Namespace:
Labels: <none>
Annotations: <none>
API Version: apiregistration.k8s.io/v1
Kind: APIService
Metadata:
Creation Timestamp: 2023-04-06T12:54:20Z
Managed Fields:
API Version: apiregistration.k8s.io/v1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:ownerReferences:
.:
k:{"uid":"***"}:
f:spec:
f:caBundle:
f:group:
f:groupPriorityMinimum:
f:service:
.:
f:name:
f:namespace:
f:port:
f:version:
f:versionPriority:
Manager: operator
Operation: Update
Time: 2023-04-06T12:54:20Z
API Version: apiregistration.k8s.io/v1
Fields Type: FieldsV1
fieldsV1:
f:status:
f:conditions:
.:
k:{"type":"Available"}:
.:
f:lastTransitionTime:
f:message:
f:reason:
f:status:
f:type:
Manager: kube-apiserver
Operation: Update
Subresource: status
Time: 2023-04-18T12:35:03Z
Owner References:
API Version: operator.tigera.io/v1
Block Owner Deletion: true
Controller: true
Kind: APIServer
Name: default
UID: ***
Resource Version: ***
UID: ***
Spec:
Ca Bundle: ***
Group: projectcalico.org
Group Priority Minimum: 1500
Service:
Name: calico-api
Namespace: calico-apiserver
Port: 443
Version: v3
Version Priority: 200
Status:
Conditions:
Last Transition Time: 2023-04-06T12:54:20Z
Message: failing or missing response from https://10.107.208.239:443/apis/projectcalico.org/v3: Get "https://10.107.208.239:443/apis/projectcalico.org/v3": dial tcp 10.107.208.239:443: i/o timeout
Reason: FailedDiscoveryCheck
Status: False
Type: Available
Events: <none>
The logs of the calico-apiservice looks like this:
➜ ~ kubectl logs --tail=-1 -n calico-apiserver -l k8s-app=calico-apiserver
Version: v3.25.1
Build date: 2023-03-30T23:52:23+0000
Git tag ref: v3.25.1
Git commit: 82dadbce1
I0413 15:15:19.483989 1 plugins.go:158] Loaded 2 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,MutatingAdmissionWebhook.
I0413 15:15:19.484036 1 plugins.go:161] Loaded 1 validating admission controller(s) successfully in the following order: ValidatingAdmissionWebhook.
I0413 15:15:19.604542 1 run_server.go:69] Running the API server
I0413 15:15:19.604578 1 run_server.go:58] Starting watch extension
W0413 15:15:19.606431 1 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0413 15:15:19.630055 1 secure_serving.go:210] Serving securely on [::]:5443
I0413 15:15:19.630147 1 dynamic_serving_content.go:132] "Starting controller" name="serving-cert::/calico-apiserver-certs/tls.crt::/calico-apiserver-certs/tls.key"
I0413 15:15:19.630257 1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
I0413 15:15:19.630679 1 run_server.go:80] apiserver is ready.
I0413 15:15:19.631104 1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I0413 15:15:19.631114 1 shared_informer.go:255] Waiting for caches to sync for RequestHeaderAuthRequestController
I0413 15:15:19.631204 1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
I0413 15:15:19.631212 1 shared_informer.go:255] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0413 15:15:19.631282 1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
I0413 15:15:19.631290 1 shared_informer.go:255] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0413 15:15:19.732007 1 shared_informer.go:262] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0413 15:15:19.732076 1 shared_informer.go:262] Caches are synced for RequestHeaderAuthRequestController
I0413 15:15:19.732510 1 shared_informer.go:262] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
Version: v3.25.1
Build date: 2023-03-30T23:52:23+0000
Git tag ref: v3.25.1
Git commit: 82dadbce1
I0413 15:15:45.802642 1 plugins.go:158] Loaded 2 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,MutatingAdmissionWebhook.
I0413 15:15:45.802806 1 plugins.go:161] Loaded 1 validating admission controller(s) successfully in the following order: ValidatingAdmissionWebhook.
I0413 15:15:45.871553 1 run_server.go:58] Starting watch extension
I0413 15:15:45.871726 1 run_server.go:69] Running the API server
W0413 15:15:45.872885 1 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0413 15:15:45.885723 1 secure_serving.go:210] Serving securely on [::]:5443
I0413 15:15:45.886356 1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I0413 15:15:45.886370 1 shared_informer.go:255] Waiting for caches to sync for RequestHeaderAuthRequestController
I0413 15:15:45.886523 1 run_server.go:80] apiserver is ready.
I0413 15:15:45.886549 1 dynamic_serving_content.go:132] "Starting controller" name="serving-cert::/calico-apiserver-certs/tls.crt::/calico-apiserver-certs/tls.key"
I0413 15:15:45.886667 1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
I0413 15:15:45.888123 1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
I0413 15:15:45.888133 1 shared_informer.go:255] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0413 15:15:45.888363 1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
I0413 15:15:45.888375 1 shared_informer.go:255] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0413 15:15:45.986627 1 shared_informer.go:262] Caches are synced for RequestHeaderAuthRequestController
I0413 15:15:45.988477 1 shared_informer.go:262] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0413 15:15:45.988829 1 shared_informer.go:262] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
And the tigerastatus apiserver looks like this: ➜ ~ kubectl describe tigerastatus apiserver
Name: apiserver
Namespace:
Labels: <none>
Annotations: <none>
API Version: operator.tigera.io/v1
Kind: TigeraStatus
Metadata:
Creation Timestamp: 2023-03-24T16:01:19Z
Generation: 1
Managed Fields:
API Version: operator.tigera.io/v1
Fields Type: FieldsV1
fieldsV1:
f:spec:
Manager: operator
Operation: Update
Time: 2023-03-24T16:01:19Z
API Version: operator.tigera.io/v1
Fields Type: FieldsV1
fieldsV1:
f:status:
.:
f:conditions:
Manager: operator
Operation: Update
Subresource: status
Time: 2023-04-13T15:15:56Z
Resource Version: ***
UID: ***
Spec:
Status:
Conditions:
Last Transition Time: 2023-04-06T12:54:24Z
Message: All Objects Available
Observed Generation: 1
Reason: AllObjectsAvailable
Status: False
Type: Degraded
Last Transition Time: 2023-04-13T15:15:56Z
Message: All objects available
Observed Generation: 1
Reason: AllObjectsAvailable
Status: True
Type: Available
Last Transition Time: 2023-04-13T15:15:56Z
Message: All Objects Available
Observed Generation: 1
Reason: AllObjectsAvailable
Status: False
Type: Progressing
Events: <none>
@caseydavenport Can you maybe point me in the correct direction with this information?
I also meet this issue, my cluster bases on openstack VMs.
@lucasscheepers I was running into the same issue and was able to get around it by following the Manifest Install directions here: https://docs.tigera.io/calico/latest/operations/install-apiserver
Specifically the patch command fixed the issue:
kubectl patch apiservice v3.projectcalico.org -p \ "{\"spec\": {\"caBundle\": \"$(kubectl get secret -n calico-apiserver calico-apiserver-certs -o go-template='{{ index .data "apiserver.crt" }}')\"}}"
getting the same issue while trying to install prometheus operator.
❯ k get apiservices.apiregistration.k8s.io -A
NAME SERVICE AVAILABLE AGE
v2.autoscaling Local True 59d
v2beta1.helm.toolkit.fluxcd.io Local True 11d
v2beta2.autoscaling Local True 59d
v3.projectcalico.org calico-apiserver/calico-api False (FailedDiscoveryCheck) 6m38s
❯ k get apiservices.apiregistration.k8s.io v3.projectcalico.org -o yaml
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
creationTimestamp: "2023-05-08T00:17:15Z"
name: v3.projectcalico.org
ownerReferences:
- apiVersion: operator.tigera.io/v1
blockOwnerDeletion: true
controller: true
kind: APIServer
name: default
uid: 34d2ccfa-07e2-4ec8-82b1-25a3e9e3be73
resourceVersion: "25407370"
uid: 830669ef-f81a-4e2d-9765-e3e2066f8f33
spec:
caBundle: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUR5akNDQXJLZ0F3SUJBZ0lJVjgwSlNBRkFKWkF3RFFZSktvWklodmNOQVFFTEJRQXdJVEVmTUIwR0ExVUUKQXhNV2RHbG5aWEpoTFc5d1pYSmhkRzl5TFhOcFoyNWxjakFlRncweU16QTFNRGN3TVRNMU5EVmF
group: projectcalico.org
groupPriorityMinimum: 1500
service:
name: calico-api
namespace: calico-apiserver
port: 443
version: v3
versionPriority: 200
status:
conditions:
- lastTransitionTime: "2023-05-08T00:17:15Z"
message: 'failing or missing response from https://10.20.3.132:5443/apis/projectcalico.org/v3:
Get "https://10.20.3.132:5443/apis/projectcalico.org/v3": dial tcp 10.20.3.132:5443:
i/o timeout'
reason: FailedDiscoveryCheck
status: "False"
type: Available
resolved by using new helm release version for prometheus operator
but still getting
❯ k get apiservices.apiregistration.k8s.io v3.projectcalico.org -o yaml
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
creationTimestamp: "2023-05-11T19:04:37Z"
name: v3.projectcalico.org
ownerReferences:
group: projectcalico.org
groupPriorityMinimum: 1500
service:
name: calico-api
namespace: calico-apiserver
port: 443
version: v3
versionPriority: 200
status:
conditions:
- lastTransitionTime: "2023-05-11T19:04:37Z"
message: 'failing or missing response from https://10.10.101.253:5443/apis/projectcalico.org/v3:
Get "https://10.10.101.253:5443/apis/projectcalico.org/v3": dial tcp 10.10.101.253:5443:
i/o timeout'
reason: FailedDiscoveryCheck
status: "False"
type: Available
@lucasscheepers did you manage to resolve it? Namespace deletions get stuck in terminating state due to this apiservice not being responsive. Any way of turning it off?
We are faced with same problem when installing tigera-operator helm chart with APIServer enabled on EKS cluster.
We are faced with same problem when installing tigera-operator helm chart with APIServer enabled on EKS cluster.
I was also facing the same issue, and fixed it for the time being by running
kubectl delete apiserver default
Based on https://docs.tigera.io/calico/latest/operations/install-apiserver#uninstall-the-calico-api-server
Since we are using the default calico helm chart based install, I think the apiserver was getting created, but not configuring it properly perhaps. And since I doubt if we have a need to update the Calico settings from kubectl as part of our use-case, I think it is best to delete it for now. I will also try to find some helm value in the tigera-operator to disable this from start if possible.
PS: I am new to Calico, and please let me know if this is "unsafe" to remove, although the documentation above does not seem to suggest so.
EDIT It is easy to disable the apiServer with the helm values
apiServer:
enabled: true # Change to false
Also, it seems it is not so important after all - https://docs.tigera.io/calico/latest/reference/architecture/overview#calico-api-server The component architecture says it is only needed to manage calico with kubectl, and I think that would logically mean, it is not used from "within" the cluster.
If you are uninstalling apiserver, then you won't be able to install networkpolicies with helm chart. Working around the problem doesn't solve the issue.
For some reason, the calico-apiserver pod is failing on liveness probes because the apiserver is not starting correctly or something is not working at all, due to that, the apiservice is getting reported as FailedDiscoveryCheck . I tried to play around with deployment and other things but wasn't able to achieve something. Is there any way to enable debug logs for apiserver?
I also saw that csi nodedriver for calico was failing with following reason:
kubectl logs -f -n calico-system csi-node-driver-pzwsl -c csi-node-driver-registrar
/usr/local/bin/node-driver-registrar: error while loading shared libraries: libresolv.so.2: cannot open shared object file: No such file or directory
If you are uninstalling apiserver, then you won't be able to install networkpolicies with helm chart.
I am a bit confused and might be stupid here, but I think, granted without the calico api server you will not be able to use the projectcalico.org/v3 apiVersion, but you should still be able to use networking.k8s.io/v1 for the NetworkPolicy resource? I don't know if there is a major difference between both and a quick search says I still can use the latter within a given Kubernetes cluster with Calico CNI installed.
Working around the problem doesn't solve the issue.
Yeah, definitely the issue has to be addressed, I just wanted it to stop shouting the error a dozen times every time I try to deploy something into my EKS with helm.
The whole point of installing this operator is being able to use projectcalico.org/v3 resources.
Extends Kubernetes network policy Calico network policy provides a richer set of policy capabilities than Kubernetes including: policy ordering/priority, deny rules, and more flexible match rules. While Kubernetes network policy applies only to pods, Calico network policy can be applied to multiple types of endpoints including pods, VMs, and host interfaces. Finally, when used with Istio service mesh, Calico network policy supports securing applications layers 5-7 match criteria, and cryptographic identity.
The whole point of installing this operator is being able to use projectcalico.org/v3 resources.
Ah, for us the whole point was to replace the default AWS EKS VPC CNI with Calico CNI, and be able to deploy more pods per node and save IPs form the VPC CIDR allocated to us - since the former gives all the pods IPs from this range and also limits the number of pods per node based on node size. For us the Calico installation using Helm from the official documentation (which installs the operator) introduced this apiServer and the related errors.
So, guess the solution is valid if you just want the CNI and are limited to use with the K8s NetworkPolicy!
same, running it with tigera on EKS. Deinstalled API server and resolved it. We will see what the consequences will be. According to docs it should only block tigera cli stuff...
@xpuska513 this seems like an issue with csi-node-driver-registrar, could you share more details of you setup (versions, etc)?
kubectl logs -f -n calico-system csi-node-driver-pzwsl -c csi-node-driver-registrar
/usr/local/bin/node-driver-registrar: error while loading shared libraries: libresolv.so.2: cannot open shared object file: No such file or directory
Everyone else (if you're still able to reproduce this issue), could you post kubectl logs for the Calico apiserver pod(s)?
@coutinhop Deployed on EKS using official helm chart - version 3.25 k8s version 3.22, vpc cni - 1.12.6. Lemme try to deploy it again on fresh cluster tomorrow and I can gather more info on it. Is there any logs(from pods) useful that I can share with you?
Small observation from me, it works fine on older k8s(1.21.4) and older vpc cni release(1.8.x)
Everyone else (if you're still able to reproduce this issue), could you post
kubectl logsfor the Calico apiserver pod(s)?
$ kubectl logs calico-apiserver-7fb88d684f-fh5x7 -n calico-apiserver
E0517 13:12:47.384967 36471 memcache.go:287] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
E0517 13:12:47.387141 36471 memcache.go:121] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
E0517 13:12:47.389521 36471 memcache.go:121] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
Version: v3.25.1
Build date: 2023-03-30T23:52:23+0000
Git tag ref: v3.25.1
Git commit: 82dadbce1
I0517 12:43:22.874404 1 plugins.go:158] Loaded 2 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,MutatingAdmissionWebhook.
I0517 12:43:22.874578 1 plugins.go:161] Loaded 1 validating admission controller(s) successfully in the following order: ValidatingAdmissionWebhook.
I0517 12:43:22.955555 1 run_server.go:69] Running the API server
I0517 12:43:22.970749 1 run_server.go:58] Starting watch extension
W0517 12:43:22.970895 1 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0517 12:43:22.976511 1 secure_serving.go:210] Serving securely on [::]:5443
I0517 12:43:22.976690 1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I0517 12:43:22.976781 1 shared_informer.go:255] Waiting for caches to sync for RequestHeaderAuthRequestController
I0517 12:43:22.976884 1 dynamic_serving_content.go:132] "Starting controller" name="serving-cert::/calico-apiserver-certs/tls.crt::/calico-apiserver-certs/tls.key"
I0517 12:43:22.977142 1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
I0517 12:43:22.977618 1 run_server.go:80] apiserver is ready.
I0517 12:43:22.977748 1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
I0517 12:43:22.977829 1 shared_informer.go:255] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0517 12:43:22.977901 1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
I0517 12:43:22.977966 1 shared_informer.go:255] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0517 12:43:23.077860 1 shared_informer.go:262] Caches are synced for RequestHeaderAuthRequestController
I0517 12:43:23.078006 1 shared_informer.go:262] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0517 12:43:23.078083 1 shared_informer.go:262] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
k -ncalico-apiserver logs calico-apiserver-8757dcdf8-4z79m E0522 12:27:06.023957 1742797 memcache.go:255] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request E0522 12:27:06.024770 1742797 memcache.go:106] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request E0522 12:27:06.026818 1742797 memcache.go:106] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request Error from server: Get "https://138.96.245.50:10250/containerLogs/calico-apiserver/calico-apiserver-8757dcdf8-4z79m/calico-apiserver": tunnel closed
For those of you running on EKS, can you confirm that the calico-apiserver is running with hostNetwork: true set?
The Kubernetes API server needs to establish connection with the Calico API server, and on EKS the Kubernetes API server runs in a separate Amazon managed VPC, meaning it doesn't have routing access to pod IPs (just host IPs). As such, the Calico API server needs to run with host networking. The tigera-operator should do this automatically for you, but I'd like to double check in case something isn't detecting this correctly.
Everyone else (if you're still able to reproduce this issue), could you post
kubectl logsfor the Calico apiserver pod(s)?$ kubectl logs calico-apiserver-7fb88d684f-fh5x7 -n calico-apiserver E0517 13:12:47.384967 36471 memcache.go:287] couldn't get resource list for projectcalico.org/v3: the server is currently
For me this was some sort of firewall issue. I have configured firewall to be more permissive and I don't see this issue anymore.
@caseydavenport for me on eks it wasn't running on host network fir some reason.
@xpuska513 interesting. Could you share how you installed Calico / the apiserver on this cluster?
@caseydavenport I followed this guide https://docs.aws.amazon.com/eks/latest/userguide/calico.html which referenced this doc for tigera operator deployment https://docs.tigera.io/calico/latest/getting-started/kubernetes/helm#install-calico basically deployed the helm chart with kubernetesProvider set to EKS and I also assume that it auto-deploys apiserver out of the box when you deploy the operator using helm chart.
Yep, gotcha. That is the correct guide to follow. I realize now that if you are using the EKS VPC CNI plugin, then it is OK to have the apiserver running with hostNetwork: false, so that's unlikely to be the problem here.
Deploying today:
kubectl -n calico-apiserver logs deployment.apps/calico-apiserver
E0526 12:39:07.647072 164182 memcache.go:255] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
E0526 12:39:07.737485 164182 memcache.go:106] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
E0526 12:39:07.789410 164182 memcache.go:106] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
Found 2 pods, using pod/calico-apiserver-59dcddc4d5-d4sfp
Version: v3.25.1
Build date: 2023-03-30T23:52:23+0000
Git tag ref: v3.25.1
Git commit: 82dadbce1
I0526 10:06:19.554880 1 plugins.go:158] Loaded 2 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,MutatingAdmissionWebhook.
I0526 10:06:19.554927 1 plugins.go:161] Loaded 1 validating admission controller(s) successfully in the following order: ValidatingAdmissionWebhook.
I0526 10:06:19.643486 1 run_server.go:69] Running the API server
I0526 10:06:19.643504 1 run_server.go:58] Starting watch extension
W0526 10:06:19.643526 1 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0526 10:06:19.660286 1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I0526 10:06:19.660394 1 shared_informer.go:255] Waiting for caches to sync for RequestHeaderAuthRequestController
I0526 10:06:19.660287 1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
I0526 10:06:19.660425 1 shared_informer.go:255] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0526 10:06:19.660318 1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
I0526 10:06:19.660592 1 shared_informer.go:255] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0526 10:06:19.660670 1 secure_serving.go:210] Serving securely on [::]:5443
I0526 10:06:19.660521 1 dynamic_serving_content.go:132] "Starting controller" name="serving-cert::/calico-apiserver-certs/tls.crt::/calico-apiserver-certs/tls.key"
I0526 10:06:19.660736 1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
I0526 10:06:19.661221 1 run_server.go:80] apiserver is ready.
I0526 10:06:19.761160 1 shared_informer.go:262] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0526 10:06:19.761194 1 shared_informer.go:262] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0526 10:06:19.761256 1 shared_informer.go:262] Caches are synced for RequestHeaderAuthRequestController
I followed the guide linked above by @xpuska513 too and hostNetwork: false (well, not set). I manually installed the CRD's because I was getting errors about the annotation length exceeding the maximum.
because I was getting errors about the annotation length exceeding the maximum.
FWIW, this usually comes from using kubectl apply since kubectl adds the annotation. You should be able to do kubectl create and kubectl replace instead in order to avoid that, and it's what we currently recommend.
I am having this issue as well.
Diagnostics:
I began by installing the Calico on an AWS EKS cluster. I used the helm chart using v3.25.1. I skipped the Customize Helm Chart section of the documentation, because the [AWS documentation] (https://docs.aws.amazon.com/eks/latest/userguide/calico.html#calico-install) brings you directly to this section, so I did not bother reading the previous sections. As a result, my initial installation was without installation.kubernetesProvider: EKS.
My initial installation succeeded, and I was about to proceed with the remainder of the EKS documentation. However, I noticed that I did not like the choice of my name for the helm installation. I chose the namespace calico instead of tigera-operator, and I wanted to match the AWS documentation. As a result, I attempted to delete everything, and reinstall the helm chart. Using tigera-operator for the namespace was not allowed, possibly due to a bug in the helm chart, and I gave up and went back to using the calico namespace.
I do not know where things went foul after this section. I don't recall if I attempted to helm uninstall or what. But I do remember attempting to delete the namespaces, and getting into conditions where the namespace got stuck deleting due to various finalizers. I attempted to search for the various stuck entities and delete them. I did so successfully, and I was ultimately able to get the namespaces to delete. I believe I ran kubectl get ns calico-system -o yaml and it warned me it was stuck deleting service accounts.
Current Symptoms
I can sort of delete and reinstall calico. If I delete calico, even using helm uninstall, things get strangely stuck. The calico and calico-apiserver namespace will delete, but calico-system remains. If I run kubectl delete ns calico-system, that gets stuck due to the finalizers. I can then describe the namespace, where it will warn me about the serviceaccounts. If I delete the finalizers for the serviceaccounts, the calico-system namespace will finally delete. I can then delete the calico namespace.
There is definitely left-over resources on my K8S cluster. I found a tool, kubectl really get all which helps show me the additional resources. I had a strong suspicion that my installation is corrupt, and the best way would be to literally rebuild my entire cluster, but that is really not a good idea operationally. When I have live customers, we cannot be expected to rebuild the entire cluster if Calico has an issue.
I tried to delete all leftover resources and see to see if I could get a reinstall working.
# List out all resources in your cluster
~/kubectl-really-get-all --all-namespaces > /tmp/tmp
# Identify Calico components
cat /tmp/tmp | grep -iE "(tigera|calico)"
Once I have the list of calico components I can pipe the output into XARGS to delete everything
# DANGEROUS! DO NOT DO UNLESS YOU KNOW WHAT YOU ARE DOING!
... | awk '{print $1}' | xargs -L1 -I input bash -c "kubectl delete input"
This initially kept getting stuck, so I would need to manually modify the the items that got stuck and remove the finalizer, such as kubectl edit clusterrole.rbac.authorization.k8s.io/calico-node.
I then verified that everything was completely deleted using kubectl really get all. However, after reinstallation, calico came online. The calico-system after immediately getting created was stuck in a terminating state.
status:
conditions:
- lastTransitionTime: "2023-06-23T16:29:14Z"
message: 'Discovery failed for some groups, 1 failing: unable to retrieve the
complete list of server APIs: projectcalico.org/v3: the server is currently
unable to handle the request'
reason: DiscoveryFailed
status: "True"
type: NamespaceDeletionDiscoveryFailure
- lastTransitionTime: "2023-06-23T16:29:14Z"
message: All legacy kube types successfully parsed
reason: ParsedGroupVersions
status: "False"
type: NamespaceDeletionGroupVersionParsingFailure
- lastTransitionTime: "2023-06-23T16:29:14Z"
message: All content successfully deleted, may be waiting on finalization
reason: ContentDeleted
status: "False"
type: NamespaceDeletionContentFailure
- lastTransitionTime: "2023-06-23T16:29:14Z"
message: All content successfully removed
reason: ContentRemoved
status: "False"
type: NamespaceContentRemaining
- lastTransitionTime: "2023-06-23T16:29:14Z"
message: All content-preserving finalizers finished
reason: ContentHasNoFinalizers
status: "False"
type: NamespaceFinalizersRemaining
phase: Terminating
At this point, I'm not sure how to clean my system. I may try it 1 more time, but I may be stuck with VPC + Cluster rebuild.