calico icon indicating copy to clipboard operation
calico copied to clipboard

Couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request

Open lucasscheepers opened this issue 2 years ago • 50 comments

@caseydavenport Your request was to raise a separate issue.

I installed the latest version of calico using this helm chart. The kube-apiserver-kmaster1 returns the following error in the logs: v3.projectcalico.org failed with: failing or missing response from https://**:443/apis/projectcalico.org/v3.

Also after each random kubectl command it returns errors about these CRDs.

E0406 15:33:42.335160   55793 memcache.go:106] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
NAME       STATUS   ROLES           AGE   VERSION
kmaster1   Ready    control-plane   19d   v1.26.3
kworker1   Ready    <none>          18d   v1.26.3
kworker2   Ready    <none>          18d   v1.26.3

These CRDs are automitcally installed using the helm chart stated above.

--> k api-resources | grep calico
E0406 15:34:26.465805   55853 memcache.go:255] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
E0406 15:34:26.481896   55853 memcache.go:106] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
bgpconfigurations                              crd.projectcalico.org/v1               false        BGPConfiguration
bgppeers                                       crd.projectcalico.org/v1               false        BGPPeer
blockaffinities                                crd.projectcalico.org/v1               false        BlockAffinity
caliconodestatuses                             crd.projectcalico.org/v1               false        CalicoNodeStatus
clusterinformations                            crd.projectcalico.org/v1               false        ClusterInformation
felixconfigurations                            crd.projectcalico.org/v1               false        FelixConfiguration
globalnetworkpolicies                          crd.projectcalico.org/v1               false        GlobalNetworkPolicy
globalnetworksets                              crd.projectcalico.org/v1               false        GlobalNetworkSet
hostendpoints                                  crd.projectcalico.org/v1               false        HostEndpoint
ipamblocks                                     crd.projectcalico.org/v1               false        IPAMBlock
ipamconfigs                                    crd.projectcalico.org/v1               false        IPAMConfig
ipamhandles                                    crd.projectcalico.org/v1               false        IPAMHandle
ippools                                        crd.projectcalico.org/v1               false        IPPool
ipreservations                                 crd.projectcalico.org/v1               false        IPReservation
kubecontrollersconfigurations                  crd.projectcalico.org/v1               false        KubeControllersConfiguration
networkpolicies                                crd.projectcalico.org/v1               true         NetworkPolicy
networksets                                    crd.projectcalico.org/v1               true         NetworkSet
error: unable to retrieve the complete list of server APIs: projectcalico.org/v3: the server is currently unable to handle the request

Do I understand it correctly that these crd.projectcalico.org/v1 CRDs are still needed - so not deleting them - and I need to manually install the v3 CRDs? If so, where can I download these v3 CRDs as I can't find it

I believe chet-tuttle-3 is facing some similar issues

lucasscheepers avatar Apr 14 '23 10:04 lucasscheepers

Do I understand it correctly that these crd.projectcalico.org/v1 CRDs are still needed - so not deleting them - and I need to manually install the v3 CRDs? If so, where can I download these v3 CRDs as I can't find it

The v1 resources are CRDs and should be present - definitely don't delete those.

The v3 resources are not CRDs - they are implemented by the calico-apiserver pod in the calico-apiserver namespace.

the server is currently unable to handle the request

This suggests a problem with the Calico API server, or a problem with the kube-apiserver being unable to communicate with the Calico API server. I'd:

  • kubectl describe apiservice to see if there's any breadcrumbs to follow there.
  • kubectl logs --tail=-1 -n calico-apiserver -l k8s-app=calico-apiserver to get the full API server logs.
  • kubectl describe tigerastatus apiserver for potentially further breadcrumbs.

caseydavenport avatar Apr 14 '23 15:04 caseydavenport

The only apiservice that has a status of false is v3.projectcalico.org that has the following error message: failing or missing response from https://***:443/apis/projectcalico.org/v3: Get "https://***:443/apis/projectcalico.org/v3": context deadline exceeded

➜ ~ kubectl describe apiservice v3.projectcalico.org

Name:         v3.projectcalico.org
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  apiregistration.k8s.io/v1
Kind:         APIService
Metadata: 
  Creation Timestamp:  2023-04-06T12:54:20Z
  Managed Fields:
    API Version:  apiregistration.k8s.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:ownerReferences:
          .:
          k:{"uid":"***"}:
      f:spec:
        f:caBundle:
        f:group:
        f:groupPriorityMinimum:
        f:service:
          .:
          f:name:
          f:namespace:
          f:port:
        f:version:
        f:versionPriority:
    Manager:      operator
    Operation:    Update
    Time:         2023-04-06T12:54:20Z
    API Version:  apiregistration.k8s.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        f:conditions:
          .:
          k:{"type":"Available"}:
            .:
            f:lastTransitionTime:
            f:message:
            f:reason:
            f:status:
            f:type:
    Manager:      kube-apiserver
    Operation:    Update
    Subresource:  status
    Time:         2023-04-18T12:35:03Z
  Owner References:
    API Version:           operator.tigera.io/v1
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  APIServer
    Name:                  default
    UID:                   ***
  Resource Version:       ***
  UID:                     ***
Spec:
  Ca Bundle:              ***
  Group:                   projectcalico.org
  Group Priority Minimum:  1500
  Service:
    Name:            calico-api
    Namespace:       calico-apiserver
    Port:            443
  Version:           v3
  Version Priority:  200
Status:
  Conditions:
    Last Transition Time:  2023-04-06T12:54:20Z
    Message:               failing or missing response from https://10.107.208.239:443/apis/projectcalico.org/v3: Get "https://10.107.208.239:443/apis/projectcalico.org/v3": dial tcp 10.107.208.239:443: i/o timeout
    Reason:                FailedDiscoveryCheck
    Status:                False
    Type:                  Available
Events:                    <none>

The logs of the calico-apiservice looks like this: ➜ ~ kubectl logs --tail=-1 -n calico-apiserver -l k8s-app=calico-apiserver

Version:      v3.25.1
Build date:   2023-03-30T23:52:23+0000
Git tag ref:  v3.25.1
Git commit:   82dadbce1
I0413 15:15:19.483989       1 plugins.go:158] Loaded 2 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,MutatingAdmissionWebhook.
I0413 15:15:19.484036       1 plugins.go:161] Loaded 1 validating admission controller(s) successfully in the following order: ValidatingAdmissionWebhook.
I0413 15:15:19.604542       1 run_server.go:69] Running the API server
I0413 15:15:19.604578       1 run_server.go:58] Starting watch extension
W0413 15:15:19.606431       1 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0413 15:15:19.630055       1 secure_serving.go:210] Serving securely on [::]:5443
I0413 15:15:19.630147       1 dynamic_serving_content.go:132] "Starting controller" name="serving-cert::/calico-apiserver-certs/tls.crt::/calico-apiserver-certs/tls.key"
I0413 15:15:19.630257       1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
I0413 15:15:19.630679       1 run_server.go:80] apiserver is ready.
I0413 15:15:19.631104       1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I0413 15:15:19.631114       1 shared_informer.go:255] Waiting for caches to sync for RequestHeaderAuthRequestController
I0413 15:15:19.631204       1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
I0413 15:15:19.631212       1 shared_informer.go:255] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0413 15:15:19.631282       1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
I0413 15:15:19.631290       1 shared_informer.go:255] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0413 15:15:19.732007       1 shared_informer.go:262] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0413 15:15:19.732076       1 shared_informer.go:262] Caches are synced for RequestHeaderAuthRequestController
I0413 15:15:19.732510       1 shared_informer.go:262] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
Version:      v3.25.1
Build date:   2023-03-30T23:52:23+0000
Git tag ref:  v3.25.1
Git commit:   82dadbce1
I0413 15:15:45.802642       1 plugins.go:158] Loaded 2 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,MutatingAdmissionWebhook.
I0413 15:15:45.802806       1 plugins.go:161] Loaded 1 validating admission controller(s) successfully in the following order: ValidatingAdmissionWebhook.
I0413 15:15:45.871553       1 run_server.go:58] Starting watch extension
I0413 15:15:45.871726       1 run_server.go:69] Running the API server
W0413 15:15:45.872885       1 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0413 15:15:45.885723       1 secure_serving.go:210] Serving securely on [::]:5443
I0413 15:15:45.886356       1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I0413 15:15:45.886370       1 shared_informer.go:255] Waiting for caches to sync for RequestHeaderAuthRequestController
I0413 15:15:45.886523       1 run_server.go:80] apiserver is ready.
I0413 15:15:45.886549       1 dynamic_serving_content.go:132] "Starting controller" name="serving-cert::/calico-apiserver-certs/tls.crt::/calico-apiserver-certs/tls.key"
I0413 15:15:45.886667       1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
I0413 15:15:45.888123       1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
I0413 15:15:45.888133       1 shared_informer.go:255] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0413 15:15:45.888363       1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
I0413 15:15:45.888375       1 shared_informer.go:255] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0413 15:15:45.986627       1 shared_informer.go:262] Caches are synced for RequestHeaderAuthRequestController
I0413 15:15:45.988477       1 shared_informer.go:262] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0413 15:15:45.988829       1 shared_informer.go:262] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file

And the tigerastatus apiserver looks like this: ➜ ~ kubectl describe tigerastatus apiserver

Name:         apiserver
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  operator.tigera.io/v1
Kind:         TigeraStatus
Metadata:
  Creation Timestamp:  2023-03-24T16:01:19Z
  Generation:          1
  Managed Fields:
    API Version:  operator.tigera.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:spec:
    Manager:      operator
    Operation:    Update
    Time:         2023-03-24T16:01:19Z
    API Version:  operator.tigera.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        .:
        f:conditions:
    Manager:         operator
    Operation:       Update
    Subresource:     status
    Time:            2023-04-13T15:15:56Z
  Resource Version:  ***
  UID:               ***
Spec:
Status:
  Conditions:
    Last Transition Time:  2023-04-06T12:54:24Z
    Message:               All Objects Available
    Observed Generation:   1
    Reason:                AllObjectsAvailable
    Status:                False
    Type:                  Degraded
    Last Transition Time:  2023-04-13T15:15:56Z
    Message:               All objects available
    Observed Generation:   1
    Reason:                AllObjectsAvailable
    Status:                True
    Type:                  Available
    Last Transition Time:  2023-04-13T15:15:56Z
    Message:               All Objects Available
    Observed Generation:   1
    Reason:                AllObjectsAvailable
    Status:                False
    Type:                  Progressing
Events:                    <none>

@caseydavenport Can you maybe point me in the correct direction with this information?

lucasscheepers avatar Apr 18 '23 12:04 lucasscheepers

I also meet this issue, my cluster bases on openstack VMs.

kinbod avatar Apr 26 '23 06:04 kinbod

@lucasscheepers I was running into the same issue and was able to get around it by following the Manifest Install directions here: https://docs.tigera.io/calico/latest/operations/install-apiserver

Specifically the patch command fixed the issue: kubectl patch apiservice v3.projectcalico.org -p \ "{\"spec\": {\"caBundle\": \"$(kubectl get secret -n calico-apiserver calico-apiserver-certs -o go-template='{{ index .data "apiserver.crt" }}')\"}}"

0HFgX1pM8hUbvsCpAl3D avatar Apr 28 '23 19:04 0HFgX1pM8hUbvsCpAl3D

getting the same issue while trying to install prometheus operator.

❯ k get apiservices.apiregistration.k8s.io -A
NAME SERVICE AVAILABLE AGE v2.autoscaling Local True 59d v2beta1.helm.toolkit.fluxcd.io Local True 11d v2beta2.autoscaling Local True 59d v3.projectcalico.org calico-apiserver/calico-api False (FailedDiscoveryCheck) 6m38s

❯ k get apiservices.apiregistration.k8s.io  v3.projectcalico.org -o yaml
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
  creationTimestamp: "2023-05-08T00:17:15Z"
  name: v3.projectcalico.org
  ownerReferences:
  - apiVersion: operator.tigera.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: APIServer
    name: default
    uid: 34d2ccfa-07e2-4ec8-82b1-25a3e9e3be73
  resourceVersion: "25407370"
  uid: 830669ef-f81a-4e2d-9765-e3e2066f8f33
spec:
  caBundle: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUR5akNDQXJLZ0F3SUJBZ0lJVjgwSlNBRkFKWkF3RFFZSktvWklodmNOQVFFTEJRQXdJVEVmTUIwR0ExVUUKQXhNV2RHbG5aWEpoTFc5d1pYSmhkRzl5TFhOcFoyNWxjakFlRncweU16QTFNRGN3TVRNMU5EVmF
  group: projectcalico.org
  groupPriorityMinimum: 1500
  service:
    name: calico-api
    namespace: calico-apiserver
    port: 443
  version: v3
  versionPriority: 200
status:
  conditions:
  - lastTransitionTime: "2023-05-08T00:17:15Z"
    message: 'failing or missing response from https://10.20.3.132:5443/apis/projectcalico.org/v3:
      Get "https://10.20.3.132:5443/apis/projectcalico.org/v3": dial tcp 10.20.3.132:5443:
      i/o timeout'
    reason: FailedDiscoveryCheck
    status: "False"
    type: Available

ndacic avatar May 08 '23 00:05 ndacic

resolved by using new helm release version for prometheus operator

ndacic avatar May 08 '23 06:05 ndacic

but still getting

❯ k get apiservices.apiregistration.k8s.io v3.projectcalico.org -o yaml
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
  creationTimestamp: "2023-05-11T19:04:37Z"
  name: v3.projectcalico.org
  ownerReferences:
  group: projectcalico.org
  groupPriorityMinimum: 1500
  service:
    name: calico-api
    namespace: calico-apiserver
    port: 443
  version: v3
  versionPriority: 200
status:
  conditions:
  - lastTransitionTime: "2023-05-11T19:04:37Z"
    message: 'failing or missing response from https://10.10.101.253:5443/apis/projectcalico.org/v3:
      Get "https://10.10.101.253:5443/apis/projectcalico.org/v3": dial tcp 10.10.101.253:5443:
      i/o timeout'
    reason: FailedDiscoveryCheck
    status: "False"
    type: Available

ndacic avatar May 11 '23 19:05 ndacic

@lucasscheepers did you manage to resolve it? Namespace deletions get stuck in terminating state due to this apiservice not being responsive. Any way of turning it off?

ndacic avatar May 11 '23 19:05 ndacic

We are faced with same problem when installing tigera-operator helm chart with APIServer enabled on EKS cluster.

suciuandrei94 avatar May 12 '23 13:05 suciuandrei94

We are faced with same problem when installing tigera-operator helm chart with APIServer enabled on EKS cluster.

I was also facing the same issue, and fixed it for the time being by running

kubectl delete apiserver default

Based on https://docs.tigera.io/calico/latest/operations/install-apiserver#uninstall-the-calico-api-server

Since we are using the default calico helm chart based install, I think the apiserver was getting created, but not configuring it properly perhaps. And since I doubt if we have a need to update the Calico settings from kubectl as part of our use-case, I think it is best to delete it for now. I will also try to find some helm value in the tigera-operator to disable this from start if possible.

PS: I am new to Calico, and please let me know if this is "unsafe" to remove, although the documentation above does not seem to suggest so.

EDIT It is easy to disable the apiServer with the helm values

apiServer:
  enabled: true # Change to false

Also, it seems it is not so important after all - https://docs.tigera.io/calico/latest/reference/architecture/overview#calico-api-server The component architecture says it is only needed to manage calico with kubectl, and I think that would logically mean, it is not used from "within" the cluster.

dhananjaipai avatar May 12 '23 14:05 dhananjaipai

If you are uninstalling apiserver, then you won't be able to install networkpolicies with helm chart. Working around the problem doesn't solve the issue.

suciuandrei94 avatar May 12 '23 15:05 suciuandrei94

For some reason, the calico-apiserver pod is failing on liveness probes because the apiserver is not starting correctly or something is not working at all, due to that, the apiservice is getting reported as FailedDiscoveryCheck . I tried to play around with deployment and other things but wasn't able to achieve something. Is there any way to enable debug logs for apiserver?

I also saw that csi nodedriver for calico was failing with following reason:

kubectl logs -f -n calico-system csi-node-driver-pzwsl -c csi-node-driver-registrar
/usr/local/bin/node-driver-registrar: error while loading shared libraries: libresolv.so.2: cannot open shared object file: No such file or directory

xpuska513 avatar May 12 '23 15:05 xpuska513

If you are uninstalling apiserver, then you won't be able to install networkpolicies with helm chart.

I am a bit confused and might be stupid here, but I think, granted without the calico api server you will not be able to use the projectcalico.org/v3 apiVersion, but you should still be able to use networking.k8s.io/v1 for the NetworkPolicy resource? I don't know if there is a major difference between both and a quick search says I still can use the latter within a given Kubernetes cluster with Calico CNI installed.

Working around the problem doesn't solve the issue.

Yeah, definitely the issue has to be addressed, I just wanted it to stop shouting the error a dozen times every time I try to deploy something into my EKS with helm.

dhananjaipai avatar May 12 '23 15:05 dhananjaipai

The whole point of installing this operator is being able to use projectcalico.org/v3 resources.

Extends Kubernetes network policy Calico network policy provides a richer set of policy capabilities than Kubernetes including: policy ordering/priority, deny rules, and more flexible match rules. While Kubernetes network policy applies only to pods, Calico network policy can be applied to multiple types of endpoints including pods, VMs, and host interfaces. Finally, when used with Istio service mesh, Calico network policy supports securing applications layers 5-7 match criteria, and cryptographic identity.

suciuandrei94 avatar May 14 '23 08:05 suciuandrei94

The whole point of installing this operator is being able to use projectcalico.org/v3 resources.

Ah, for us the whole point was to replace the default AWS EKS VPC CNI with Calico CNI, and be able to deploy more pods per node and save IPs form the VPC CIDR allocated to us - since the former gives all the pods IPs from this range and also limits the number of pods per node based on node size. For us the Calico installation using Helm from the official documentation (which installs the operator) introduced this apiServer and the related errors.

So, guess the solution is valid if you just want the CNI and are limited to use with the K8s NetworkPolicy!

dhananjaipai avatar May 14 '23 08:05 dhananjaipai

same, running it with tigera on EKS. Deinstalled API server and resolved it. We will see what the consequences will be. According to docs it should only block tigera cli stuff...

ndacic avatar May 16 '23 16:05 ndacic

@xpuska513 this seems like an issue with csi-node-driver-registrar, could you share more details of you setup (versions, etc)?

kubectl logs -f -n calico-system csi-node-driver-pzwsl -c csi-node-driver-registrar
/usr/local/bin/node-driver-registrar: error while loading shared libraries: libresolv.so.2: cannot open shared object file: No such file or directory

Everyone else (if you're still able to reproduce this issue), could you post kubectl logs for the Calico apiserver pod(s)?

coutinhop avatar May 16 '23 16:05 coutinhop

@coutinhop Deployed on EKS using official helm chart - version 3.25 k8s version 3.22, vpc cni - 1.12.6. Lemme try to deploy it again on fresh cluster tomorrow and I can gather more info on it. Is there any logs(from pods) useful that I can share with you?

xpuska513 avatar May 16 '23 20:05 xpuska513

Small observation from me, it works fine on older k8s(1.21.4) and older vpc cni release(1.8.x)

xpuska513 avatar May 16 '23 20:05 xpuska513

Everyone else (if you're still able to reproduce this issue), could you post kubectl logs for the Calico apiserver pod(s)?

$ kubectl logs calico-apiserver-7fb88d684f-fh5x7 -n calico-apiserver
E0517 13:12:47.384967   36471 memcache.go:287] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
E0517 13:12:47.387141   36471 memcache.go:121] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
E0517 13:12:47.389521   36471 memcache.go:121] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
Version:      v3.25.1
Build date:   2023-03-30T23:52:23+0000
Git tag ref:  v3.25.1
Git commit:   82dadbce1
I0517 12:43:22.874404       1 plugins.go:158] Loaded 2 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,MutatingAdmissionWebhook.
I0517 12:43:22.874578       1 plugins.go:161] Loaded 1 validating admission controller(s) successfully in the following order: ValidatingAdmissionWebhook.
I0517 12:43:22.955555       1 run_server.go:69] Running the API server
I0517 12:43:22.970749       1 run_server.go:58] Starting watch extension
W0517 12:43:22.970895       1 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0517 12:43:22.976511       1 secure_serving.go:210] Serving securely on [::]:5443
I0517 12:43:22.976690       1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I0517 12:43:22.976781       1 shared_informer.go:255] Waiting for caches to sync for RequestHeaderAuthRequestController
I0517 12:43:22.976884       1 dynamic_serving_content.go:132] "Starting controller" name="serving-cert::/calico-apiserver-certs/tls.crt::/calico-apiserver-certs/tls.key"
I0517 12:43:22.977142       1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
I0517 12:43:22.977618       1 run_server.go:80] apiserver is ready.
I0517 12:43:22.977748       1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
I0517 12:43:22.977829       1 shared_informer.go:255] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0517 12:43:22.977901       1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
I0517 12:43:22.977966       1 shared_informer.go:255] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0517 12:43:23.077860       1 shared_informer.go:262] Caches are synced for RequestHeaderAuthRequestController
I0517 12:43:23.078006       1 shared_informer.go:262] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0517 12:43:23.078083       1 shared_informer.go:262] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file

vhelke avatar May 17 '23 13:05 vhelke

k -ncalico-apiserver logs calico-apiserver-8757dcdf8-4z79m E0522 12:27:06.023957 1742797 memcache.go:255] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request E0522 12:27:06.024770 1742797 memcache.go:106] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request E0522 12:27:06.026818 1742797 memcache.go:106] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request Error from server: Get "https://138.96.245.50:10250/containerLogs/calico-apiserver/calico-apiserver-8757dcdf8-4z79m/calico-apiserver": tunnel closed

turletti avatar May 22 '23 10:05 turletti

For those of you running on EKS, can you confirm that the calico-apiserver is running with hostNetwork: true set?

The Kubernetes API server needs to establish connection with the Calico API server, and on EKS the Kubernetes API server runs in a separate Amazon managed VPC, meaning it doesn't have routing access to pod IPs (just host IPs). As such, the Calico API server needs to run with host networking. The tigera-operator should do this automatically for you, but I'd like to double check in case something isn't detecting this correctly.

caseydavenport avatar May 22 '23 17:05 caseydavenport

Everyone else (if you're still able to reproduce this issue), could you post kubectl logs for the Calico apiserver pod(s)?

$ kubectl logs calico-apiserver-7fb88d684f-fh5x7 -n calico-apiserver
E0517 13:12:47.384967   36471 memcache.go:287] couldn't get resource list for projectcalico.org/v3: the server is currently 

For me this was some sort of firewall issue. I have configured firewall to be more permissive and I don't see this issue anymore.

vhelke avatar May 23 '23 18:05 vhelke

@caseydavenport for me on eks it wasn't running on host network fir some reason.

xpuska513 avatar May 24 '23 06:05 xpuska513

@xpuska513 interesting. Could you share how you installed Calico / the apiserver on this cluster?

caseydavenport avatar May 24 '23 16:05 caseydavenport

@caseydavenport I followed this guide https://docs.aws.amazon.com/eks/latest/userguide/calico.html which referenced this doc for tigera operator deployment https://docs.tigera.io/calico/latest/getting-started/kubernetes/helm#install-calico basically deployed the helm chart with kubernetesProvider set to EKS and I also assume that it auto-deploys apiserver out of the box when you deploy the operator using helm chart.

xpuska513 avatar May 24 '23 18:05 xpuska513

Yep, gotcha. That is the correct guide to follow. I realize now that if you are using the EKS VPC CNI plugin, then it is OK to have the apiserver running with hostNetwork: false, so that's unlikely to be the problem here.

caseydavenport avatar May 24 '23 18:05 caseydavenport

Deploying today:

kubectl -n calico-apiserver logs deployment.apps/calico-apiserver
E0526 12:39:07.647072  164182 memcache.go:255] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
E0526 12:39:07.737485  164182 memcache.go:106] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
E0526 12:39:07.789410  164182 memcache.go:106] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
Found 2 pods, using pod/calico-apiserver-59dcddc4d5-d4sfp
Version:      v3.25.1
Build date:   2023-03-30T23:52:23+0000
Git tag ref:  v3.25.1
Git commit:   82dadbce1
I0526 10:06:19.554880       1 plugins.go:158] Loaded 2 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,MutatingAdmissionWebhook.
I0526 10:06:19.554927       1 plugins.go:161] Loaded 1 validating admission controller(s) successfully in the following order: ValidatingAdmissionWebhook.
I0526 10:06:19.643486       1 run_server.go:69] Running the API server
I0526 10:06:19.643504       1 run_server.go:58] Starting watch extension
W0526 10:06:19.643526       1 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0526 10:06:19.660286       1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I0526 10:06:19.660394       1 shared_informer.go:255] Waiting for caches to sync for RequestHeaderAuthRequestController
I0526 10:06:19.660287       1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
I0526 10:06:19.660425       1 shared_informer.go:255] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0526 10:06:19.660318       1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
I0526 10:06:19.660592       1 shared_informer.go:255] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0526 10:06:19.660670       1 secure_serving.go:210] Serving securely on [::]:5443
I0526 10:06:19.660521       1 dynamic_serving_content.go:132] "Starting controller" name="serving-cert::/calico-apiserver-certs/tls.crt::/calico-apiserver-certs/tls.key"
I0526 10:06:19.660736       1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
I0526 10:06:19.661221       1 run_server.go:80] apiserver is ready.
I0526 10:06:19.761160       1 shared_informer.go:262] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0526 10:06:19.761194       1 shared_informer.go:262] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0526 10:06:19.761256       1 shared_informer.go:262] Caches are synced for RequestHeaderAuthRequestController

I followed the guide linked above by @xpuska513 too and hostNetwork: false (well, not set). I manually installed the CRD's because I was getting errors about the annotation length exceeding the maximum.

junglie85 avatar May 26 '23 11:05 junglie85

because I was getting errors about the annotation length exceeding the maximum.

FWIW, this usually comes from using kubectl apply since kubectl adds the annotation. You should be able to do kubectl create and kubectl replace instead in order to avoid that, and it's what we currently recommend.

caseydavenport avatar May 30 '23 16:05 caseydavenport

I am having this issue as well.

Diagnostics:

I began by installing the Calico on an AWS EKS cluster. I used the helm chart using v3.25.1. I skipped the Customize Helm Chart section of the documentation, because the [AWS documentation] (https://docs.aws.amazon.com/eks/latest/userguide/calico.html#calico-install) brings you directly to this section, so I did not bother reading the previous sections. As a result, my initial installation was without installation.kubernetesProvider: EKS.

My initial installation succeeded, and I was about to proceed with the remainder of the EKS documentation. However, I noticed that I did not like the choice of my name for the helm installation. I chose the namespace calico instead of tigera-operator, and I wanted to match the AWS documentation. As a result, I attempted to delete everything, and reinstall the helm chart. Using tigera-operator for the namespace was not allowed, possibly due to a bug in the helm chart, and I gave up and went back to using the calico namespace.

I do not know where things went foul after this section. I don't recall if I attempted to helm uninstall or what. But I do remember attempting to delete the namespaces, and getting into conditions where the namespace got stuck deleting due to various finalizers. I attempted to search for the various stuck entities and delete them. I did so successfully, and I was ultimately able to get the namespaces to delete. I believe I ran kubectl get ns calico-system -o yaml and it warned me it was stuck deleting service accounts.

Current Symptoms

I can sort of delete and reinstall calico. If I delete calico, even using helm uninstall, things get strangely stuck. The calico and calico-apiserver namespace will delete, but calico-system remains. If I run kubectl delete ns calico-system, that gets stuck due to the finalizers. I can then describe the namespace, where it will warn me about the serviceaccounts. If I delete the finalizers for the serviceaccounts, the calico-system namespace will finally delete. I can then delete the calico namespace.

There is definitely left-over resources on my K8S cluster. I found a tool, kubectl really get all which helps show me the additional resources. I had a strong suspicion that my installation is corrupt, and the best way would be to literally rebuild my entire cluster, but that is really not a good idea operationally. When I have live customers, we cannot be expected to rebuild the entire cluster if Calico has an issue.

I tried to delete all leftover resources and see to see if I could get a reinstall working.

# List out all resources in your cluster
~/kubectl-really-get-all --all-namespaces > /tmp/tmp

# Identify Calico components
cat /tmp/tmp | grep -iE "(tigera|calico)"

Once I have the list of calico components I can pipe the output into XARGS to delete everything

# DANGEROUS! DO NOT DO UNLESS YOU KNOW WHAT YOU ARE DOING!
 ... | awk '{print $1}' | xargs -L1 -I input bash -c "kubectl delete input"

This initially kept getting stuck, so I would need to manually modify the the items that got stuck and remove the finalizer, such as kubectl edit clusterrole.rbac.authorization.k8s.io/calico-node.

I then verified that everything was completely deleted using kubectl really get all. However, after reinstallation, calico came online. The calico-system after immediately getting created was stuck in a terminating state.

status:
  conditions:
  - lastTransitionTime: "2023-06-23T16:29:14Z"
    message: 'Discovery failed for some groups, 1 failing: unable to retrieve the
      complete list of server APIs: projectcalico.org/v3: the server is currently
      unable to handle the request'
    reason: DiscoveryFailed
    status: "True"
    type: NamespaceDeletionDiscoveryFailure
  - lastTransitionTime: "2023-06-23T16:29:14Z"
    message: All legacy kube types successfully parsed
    reason: ParsedGroupVersions
    status: "False"
    type: NamespaceDeletionGroupVersionParsingFailure
  - lastTransitionTime: "2023-06-23T16:29:14Z"
    message: All content successfully deleted, may be waiting on finalization
    reason: ContentDeleted
    status: "False"
    type: NamespaceDeletionContentFailure
  - lastTransitionTime: "2023-06-23T16:29:14Z"
    message: All content successfully removed
    reason: ContentRemoved
    status: "False"
    type: NamespaceContentRemaining
  - lastTransitionTime: "2023-06-23T16:29:14Z"
    message: All content-preserving finalizers finished
    reason: ContentHasNoFinalizers
    status: "False"
    type: NamespaceFinalizersRemaining
  phase: Terminating

At this point, I'm not sure how to clean my system. I may try it 1 more time, but I may be stuck with VPC + Cluster rebuild.

sig-piskule avatar Jun 23 '23 16:06 sig-piskule