linkerd2 icon indicating copy to clipboard operation
linkerd2 copied to clipboard

Some PODS are getting crashed when i am trying to upgrade the linkerd certificate.

Open sreeyadlapati opened this issue 3 years ago • 7 comments

What is the issue?

I tried to upgrade the trust anchor certtificate and issuer certificate for linkerd. I am getting the below error for some pods.

When i do linkerd check --proxy i see this message

linkerd-control-plane-proxy
---------------------------
\ The "linkerd-controller-78b96b6f94-sgstq" pod is not running 


× viz extension proxies are healthy
    Some pods do not have the current trust bundle and must be restarted:
        * grafana-848bd95ff-n8bfl
        * metrics-api-7fdd9c4776-h9r67
        * prometheus-6c665d96d9-pbh9m
        * tap-7bf49cd67b-qz8ck
        * tap-injector-7d6dfb5698-jtchs
        * web-75c64d4849-f7qph
    see https://linkerd.io/2.11/checks/#l5d-viz-proxy-healthy for hints

and when i do

kubectl -n linkerd get pods

NAME                                     READY   STATUS             RESTARTS   AGE
linkerd-controller-5cdd4c5c8-gq5sp       2/2     Running            0          48d
linkerd-controller-78b96b6f94-sgstq      0/2     CrashLoopBackOff   331        20h
linkerd-destination-6557cfd654-v257d     4/4     Running            0          20h
linkerd-identity-6567c674c8-cqmwx        2/2     Running            0          20h
linkerd-proxy-injector-8d6dd4bf7-l8v7v   2/2     Running            0          20h
linkerd-sp-validator-6c464858d5-cscrp    0/2     CrashLoopBackOff   334        20h
linkerd-sp-validator-6d9f7d4685-44b2z    2/2     Running            0          48d

The pods are crashing. Can someone guide me on how to resolve this.

How can it be reproduced?

When i do linkerd check --proxy

Logs, error output, etc

included in the issue

output of linkerd check -o short

Linkerd core checks
===================

linkerd-version
---------------
‼ cli is up-to-date
    is running version 2.11.1 but the latest stable version is 2.11.2
    see https://linkerd.io/2.11/checks/#l5d-version-cli for hints

control-plane-version
---------------------
‼ control plane is up-to-date
    is running version 2.11.1 but the latest stable version is 2.11.2
    see https://linkerd.io/2.11/checks/#l5d-version-control for hints

linkerd-control-plane-proxy
---------------------------
× control plane proxies are healthy
    The "linkerd-controller-78b96b6f94-sgstq" pod is not running
    see https://linkerd.io/2.11/checks/#l5d-cp-proxy-healthy for hints

Status check results are ×

Linkerd extensions checks
=========================

linkerd-viz
-----------
× viz extension proxies are healthy
    Some pods do not have the current trust bundle and must be restarted:
        * grafana-848bd95ff-n8bfl
        * metrics-api-7fdd9c4776-h9r67
        * prometheus-6c665d96d9-pbh9m
        * tap-7bf49cd67b-qz8ck
        * tap-injector-7d6dfb5698-jtchs
        * web-75c64d4849-f7qph
    see https://linkerd.io/2.11/checks/#l5d-viz-proxy-healthy for hints

Status check results are ×

Environment

ST

Possible solution

No response

Additional context

No response

Would you like to work on fixing this bug?

yes

sreeyadlapati avatar Jun 14 '22 15:06 sreeyadlapati

I tried to upgrade the trust anchor certtificate and issuer certificate for linkerd.

Can you explain in more details what steps you took?

linkerd-controller-78b96b6f94-sgstq 0/2 CrashLoopBackOff 331 20h

You can use kubectl describe and kubectl logs to get more information about the reason the pod is crashing.

olix0r avatar Jun 14 '22 15:06 olix0r

I have generated a new trust anchor certificate and a new issuer certificate and upgraded both ofthem using the basic commands as shown in this document. https://linkerd.io/2.10/tasks/generate-certificates/#generating-the-certificates-with-step

I tried doing kubectl describe on that but i am getting this $ kubectl describe linkerd-controller-78b96b6f94-sgstq error: the server doesn't have a resource type "linkerd-controller-78b96b6f94-sgstq"

$ kubectl logs linkerd-controller-78b96b6f94-sgstq Error from server (NotFound): pods "linkerd-controller-78b96b6f94-sgstq" not found

sreeyadlapati avatar Jun 14 '22 15:06 sreeyadlapati

There are instructions for rotating Linkerd certificates here: https://linkerd.io/2.11/tasks/manually-rotating-control-plane-tls-credentials/

adleong avatar Jun 14 '22 16:06 adleong

I have followed the same steps for upgrading the certificate. But i am getting the above mentioned errors.

sreeyadlapati avatar Jun 14 '22 18:06 sreeyadlapati

Can anyone give me some suggestions on the above issue i posted.

sreeyadlapati avatar Jun 16 '22 12:06 sreeyadlapati

@sreeyadlapati it looks like an issue that I faced and I've updated the docs: https://linkerd.io/2.11/tasks/manually-rotating-control-plane-tls-credentials/#removing-the-old-trust-anchor.

We can now remove the old trust anchor from the trust bundle we created earlier.

NOTE: Before the action, it is necessary to explicitly rollout all deployments in the linkerd namespace:

kubectl -n linkerd rollout restart deployments

Try to return the old CA and roll out all the pods in the linkerd namespace. Then you can remove the old one, and roll it out once again. It helped me, and maybe it helps you)

aatarasoff avatar Jun 17 '22 10:06 aatarasoff

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Sep 15 '22 15:09 stale[bot]