calico
calico copied to clipboard
calico-kube-controller pod is restarting : : Unauthorized
Calico kube controller pod is going to crashloopback and restarting
Error
2022-05-19 13:33:46.822 [INFO][1] main.go 92: Loaded configuration from environment config=&config.Config{LogLevel:"info", WorkloadEndpointWorkers:1, ProfileWorkers:1, PolicyWorkers:1, NodeWorkers:1, Kubeconfig:"", DatastoreType:"kubernetes"} W0519 13:33:46.828542 1 client_config.go:615] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work. 2022-05-19 13:33:46.831 [INFO][1] main.go 113: Ensuring Calico datastore is initialized 2022-05-19 13:33:46.856 [ERROR][1] client.go 261: Error getting cluster information config ClusterInformation="default" error=connection is unauthorized: Unauthorized 2022-05-19 13:33:46.856 [FATAL][1] main.go 118: Failed to initialize Calico datastore error=connection is unauthorized: Unauthorized
Context
Kubernetes is installed with calico cni. All pods are up and running. System is kept idle for 7days. We have observed that calico is restarting multiple time with the an error
[ERROR][1] client.go 261: Error getting cluster information config ClusterInformation="default" error=connection is unauthorized: Unauthorized [FATAL][1] main.go 118: Failed to initialize Calico datastore error=connection is unauthorized: Unauthorized
Environment
- Calico version: v3.19.2
- K8s Version: v1.21.5
- Operating System and version: Ubuntu
2022-05-19 13:33:46.856 [ERROR][1] client.go 261: Error getting cluster information config ClusterInformation="default" error=connection is unauthorized: Unauthorized
I'd recommend checking your API server logs to see if they contain more information as to why the connection was deemed unauthorized.
2022-05-19 13:33:46.856 [ERROR][1] client.go 261: Error getting cluster information config ClusterInformation="default" error=connection is unauthorized: Unauthorized
I'd recommend checking your API server logs to see if they contain more information as to why the connection was deemed unauthorized.
Thanks a lot for the response. Below is the error in the apiserver log:
E0519 13:34:15.705018 1 claims.go:126] unexpected validation error: *errors.errorString E0519 13:34:15.705174 1 authentication.go:63] "Unable to authenticate the request" err="[invalid bearer token, Token could not be validated.]"
Could you please provide me pointers, when this can happen. Cluster was in this state for sometime. Later calico-kube-controller pod went to up and running. What might have gone wrong during that time ?
It's hard to say, Calico kube-controllers just uses the token provided to it by Kubernetes, so if we were using an invalid token it's probably because Kubernetes failed to update the token mounted into the pod or because the kubernetes client code failed to notice that the token was updated.
It's hard to say, Calico kube-controllers just uses the token provided to it by Kubernetes, so if we were using an invalid token it's probably because Kubernetes failed to update the token mounted into the pod or because the kubernetes client code failed to notice that the token was updated.
We have brought up this setup using kubespray and kept idle for few days. Suddenly, we observed that calico kube-controller pod is going down with this error and coming back to running after sometime.
I think kubespray might provide its own version of the Calico manifests? Could you try reinstalling Calico (either the same version or a later one) with our manifests from https://projectcalico.docs.tigera.io/about/about-calico and see if that might change any instability?
@mgleung @caseydavenport,
when the calico binary starts, it will initialize cluster information using calicoclient object. that object will have the calico env variable and kubeconfig(if not present, it will use in cluster config). what is this in cluster config? where it is defined?
use that object, i could see it is creating by default cluster configuration
clusterInfo, err := c.ClusterInformation().Get(ctx, globalClusterInfoName, options.GetOptions{}) with datastoreType as kubernetes.
if err != nil {
// Create the default config if it doesn't already exist.
if _, ok := err.(cerrors.ErrorResourceDoesNotExist); ok {
newClusterInfo := v3.NewClusterInformation()
newClusterInfo.Name = globalClusterInfoName
newClusterInfo.Spec.CalicoVersion = calicoVersion
newClusterInfo.Spec.ClusterType = clusterType
newClusterInfo.Spec.ClusterGUID = fmt.Sprintf("%s", hex.EncodeToString(uuid.NewV4().Bytes()))
datastoreReady := true
newClusterInfo.Spec.DatastoreReady = &datastoreReady
_, err = c.ClusterInformation().Create(ctx, newClusterInfo, options.SetOptions{})
if err != nil {
if _, ok := err.(cerrors.ErrorResourceAlreadyExists); ok {
log.Info("Failed to create global ClusterInformation; another node got there first.")
time.Sleep(1 * time.Second)
continue
}
log.WithError(err).WithField("ClusterInformation", newClusterInfo).Errorf("Error creating cluster information config")
return err
}
} else {
log.WithError(err).WithField("ClusterInformation", globalClusterInfoName).Errorf("Error getting cluster information config")
return err
}
break
}
And in else loop it is failing,
I have doubts while creating the default cluster configuration.
- It will contact the API server for some sort of parameter?
- how key/certs will be used by the controller to connect to the API server?
- Is those certs are being rotated/renew after some time?
- where are all those certs are defined or being used?
The key / certs / token used by kube-controllers are all injected into the container by the kubelet on the system, and then interpreted by the Kubernetes golang client when creating a connection. The go client automatically handles any rotation of those credentials once the projection into the container is updated.
You should check the token within the pod to make sure that it's valid. We use in-cluster configuration as described here: https://github.com/kubernetes/client-go/tree/master/examples/in-cluster-client-configuration#authenticating-inside-the-cluster
Given the API server logs are complaining about the token, it might be worth verifying that the token mounted at /var/run/secrets/kubernetes.io/serviceaccount is actually valid, since that's the token the Calico code will use.
@cpsrujana Any updates?
I'm going to close this for now due to inactivity but please feel free to reopen this issue if you're still experiencing this issue and have diags to provide.
@lmm, can you please reopen the issue? we are still debugging and facing the issue
@caseydavenport,
Hi, i've tried checking the token validity from the path " /var/lib/kubelet/pods/5cc9962f-818f-4ff1-9824-801963d15893/../token"
path: where the pod has mounted the /var/run/secrets/kubernetes.io/serviceaccount.
I could the see the token validity is for 1yr from the day it got created. Then, why token is invalid or unauthorized as it is valid for a year? Our deployment is running from past 48d and we are seeing 172 restarts.
Below is the decoded token.
{ "aud": [ "https://kubernetes.default.svc.cluster.local" ], "exp": 1688132695, "iat": 1656596695, "iss": "https://kubernetes.default.svc.cluster.local", "kubernetes.io": { "namespace": "kube-system", "pod": { "name": "calico-kube-controllers-76758f8d4c-6l2dr", "uid": "5cc9962f-818f-4ff1-9824-801963d15893" }, "serviceaccount": { "name": "calico-kube-controllers", "uid": "6d8ba93f-ceaf-47f3-a90b-02e7c1d55745" }, "warnafter": 1656600302 }, "nbf": 1656596695, "sub": "system:serviceaccount:kube-system:calico-kube-controllers" }
@cpsrujana it's hard to say with just the token what the issue could be here. I would say we probably want to try a few basic things to validate the token:
- Quick check if the token is valid by adding it to a curl request to the K8s API:
curl --cacert ca.crt -H "Authorization: Bearer {token}" https://kubernetes.default/api/v1/pod/namespaces/{namespace} - Check the token against the token request API (https://kubernetes.io/docs/reference/kubernetes-api/authentication-resources/token-request-v1/)
Hopefully those will turn up some more details. On another note, could you describe your environment a bit more? I know that you mentioned earlier that this is a kubespray setup but are there other details about your environment that you could share with us?