linkerd2 icon indicating copy to clipboard operation
linkerd2 copied to clipboard

Linkerd route level RPS not coming up

Open mayank-ag-dev opened this issue 1 year ago • 9 comments

What is the issue?

We have deployed Linkerd stable v2.14.0 on GKE v1.24. We configured a service profile for an application, and the routes were getting added, but we could not see the RPS.

How can it be reproduced?

GKE v1.24 Linkerd stable-v2.14.0

Logs, error output, etc

ROUTE SERVICE SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 [DEFAULT] svc1 - - - - - healthz svc1 - - - - - version svc1 - - - - -

output of linkerd check -o short

linkerd-version

‼ cli is up-to-date is running version 2.14.0 but the latest stable version is 2.14.5 see https://linkerd.io/2.14/checks/#l5d-version-cli for hints

control-plane-version

‼ control plane is up-to-date is running version 2.14.0 but the latest stable version is 2.14.5 see https://linkerd.io/2.14/checks/#l5d-version-control for hints

linkerd-control-plane-proxy

‼ control plane proxies are up-to-date some proxies are not running the current version: * linkerd-destination-d995f46dc-gcvwq (stable-2.14.0) * linkerd-identity-86c6f76f6c-p6k52 (stable-2.14.0) * linkerd-proxy-injector-6fc56bcd48-2x9rs (stable-2.14.0) see https://linkerd.io/2.14/checks/#l5d-cp-proxy-version for hints

linkerd-viz

‼ linkerd-viz pods are injected could not find proxy container for metrics-api-f46599848-6j2lz pod see https://linkerd.io/2.14/checks/#l5d-viz-pods-injection for hints ‼ viz extension pods are running container "linkerd-proxy" in pod "metrics-api-f46599848-6j2lz" is not ready see https://linkerd.io/2.14/checks/#l5d-viz-pods-running for hints ‼ viz extension proxies are healthy no "linkerd-proxy" containers found in the "linkerd" namespace see https://linkerd.io/2.14/checks/#l5d-viz-proxy-healthy for hints

Status check results are √

Environment

GKE v1.24 Linkerd stable-2.14.0

Possible solution

No response

Additional context

No response

Would you like to work on fixing this bug?

yes

mayank-ag-dev avatar Nov 30 '23 06:11 mayank-ag-dev

Hey @mayank-ag-dev! Those errors from linkerd check are very concerning – they look an awful lot like linkerd-viz isn't set up correctly. Maybe uninstall and reinstall it?

Assuming that you clear the Viz errors and it's still not working, we'd like to see the Service and ServiceProfile for at least one of these workloads... thanks!

kflynn avatar Nov 30 '23 19:11 kflynn

Hey @kflynn I resolved the errors for linkerd viz sharing the service and service profile snippet

apiVersion: linkerd.io/v1alpha2
kind: ServiceProfile
metadata:
  name: podinfo-svc.podinfo.svc.cluster.local
  namespace: podinfo
spec:
  routes:
    - name: health-check
      condition:
        method: GET
        pathRegex: /healthz
    - name: version
      condition:
        method: GET
        pathRegex: /version
---
apiVersion: v1
items:
- apiVersion: v1
  kind: Service
  metadata:
    name: podinfo-svc
    namespace: podinfo
  spec:
    ports:
    - name: http
      port: 9898
      protocol: TCP
      targetPort: http
    - name: grpc
      port: 9999
      protocol: TCP
      targetPort: grpc
    selector:
      app: podinfo
    type: ClusterIP

mayank-ag-dev avatar Dec 01 '23 16:12 mayank-ag-dev

And after the Viz errors are resolved, it's still not working?

kflynn avatar Dec 01 '23 17:12 kflynn

Yes... Are there any configuration changes for linkerd service profile stable-v2.14.0?

--------------
√ can initialize the client
√ can query the Kubernetes API

kubernetes-version
------------------
√ is running the minimum Kubernetes API version

linkerd-existence
-----------------
√ 'linkerd-config' config map exists
√ heartbeat ServiceAccount exist
√ control plane replica sets are ready
√ no unschedulable pods
√ control plane pods are ready
√ cluster networks contains all node podCIDRs
√ cluster networks contains all pods
√ cluster networks contains all services

linkerd-config
--------------
√ control plane Namespace exists
√ control plane ClusterRoles exist
√ control plane ClusterRoleBindings exist
√ control plane ServiceAccounts exist
√ control plane CustomResourceDefinitions exist
√ control plane MutatingWebhookConfigurations exist
√ control plane ValidatingWebhookConfigurations exist
√ proxy-init container runs as root user if docker container runtime is used

linkerd-identity
----------------
√ certificate config is valid
√ trust anchors are using supported crypto algorithm
√ trust anchors are within their validity period
√ trust anchors are valid for at least 60 days
√ issuer cert is using supported crypto algorithm
√ issuer cert is within its validity period
√ issuer cert is valid for at least 60 days
√ issuer cert is issued by the trust anchor

linkerd-webhooks-and-apisvc-tls
-------------------------------
√ proxy-injector webhook has valid cert
√ proxy-injector cert is valid for at least 60 days
√ sp-validator webhook has valid cert
√ sp-validator cert is valid for at least 60 days
√ policy-validator webhook has valid cert
√ policy-validator cert is valid for at least 60 days

linkerd-version
---------------
√ can determine the latest version
‼ cli is up-to-date
    is running version 2.14.0 but the latest stable version is 2.14.5
    see https://linkerd.io/2.14/checks/#l5d-version-cli for hints

control-plane-version
---------------------
√ can retrieve the control plane version
‼ control plane is up-to-date
    is running version 2.14.0 but the latest stable version is 2.14.5
    see https://linkerd.io/2.14/checks/#l5d-version-control for hints
√ control plane and cli versions match

linkerd-control-plane-proxy
---------------------------
√ control plane proxies are healthy
‼ control plane proxies are up-to-date
    some proxies are not running the current version:
	* linkerd-destination-64fd9c9866-pbzxt (stable-2.14.0)
	* linkerd-identity-6c5fc457db-pwl7f (stable-2.14.0)
	* linkerd-proxy-injector-5d85b4686f-mg77v (stable-2.14.0)
    see https://linkerd.io/2.14/checks/#l5d-cp-proxy-version for hints
√ control plane proxies and cli versions match

linkerd-viz
-----------
√ linkerd-viz Namespace exists
√ can initialize the client
√ linkerd-viz ClusterRoles exist
√ linkerd-viz ClusterRoleBindings exist
√ tap API server has valid cert
√ tap API server cert is valid for at least 60 days
√ tap API service is running
√ linkerd-viz pods are injected
√ viz extension pods are running
√ viz extension proxies are healthy
‼ viz extension proxies are up-to-date
    some proxies are not running the current version:
	* metrics-api-75f76fbd65-44wv8 (stable-2.14.0)
	* prometheus-7c74c74478-7fxkz (stable-2.14.0)
	* tap-6665794f66-f6ksl (stable-2.14.0)
	* tap-injector-74f66f65d5-zkw9v (stable-2.14.0)
	* web-78c46f4b57-8wx9z (stable-2.14.0)
    see https://linkerd.io/2.14/checks/#l5d-viz-proxy-cp-version for hints
√ viz extension proxies and cli versions match
√ prometheus is installed and configured correctly
√ viz extension self-check

Status check results are √

mayank-ag-dev avatar Dec 01 '23 17:12 mayank-ag-dev

@mayank-ag-dev I think the biggest question here is whether you're using ServiceProfiles or HTTPRoutes. For per-route metrics at the moment, you need to be using ServiceProfiles.

kflynn avatar Dec 20 '23 22:12 kflynn

@kflynn We are using serviceProfiles for HTTPRoutes.

mayank-ag-dev avatar Dec 22 '23 08:12 mayank-ag-dev

@mayank-ag-dev 🤦‍♂️ So sorry to ask you to confirm ServiceProfiles when you'd already posted a ServiceProfile! Let me poke a little more into this.

kflynn avatar Jan 04 '24 15:01 kflynn

@kflynn Any update on this? We have major impact on observability cz of this.

akashsethiya avatar Jan 31 '24 08:01 akashsethiya

So far I haven't managed to reproduce this. 🙁 Are you on our Slack? If so, I'd like to connect there and try a few things with you.

kflynn avatar Mar 07 '24 16:03 kflynn

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jun 07 '24 03:06 stale[bot]