linkerd2 Duplicate metric on linkerd-proxy /metrics endpoint at linkerd injected pods after upgrade from 2025.2.1 to 2025.3.2.

What is the issue?

Prometheus scraping linkerd-proxy metrics logs "Error on ingesting samples with different value but same timestamp" for various linkerd injected pods. When checking one of the affected targets /metrics endpoint it seems that inbound_http_authz_allow_total is duplicated, logs attached.

First warnings in the prometheus logs appeared right after upgrade from 2025.2.1 to 2025.3.2.

How can it be reproduced?

It looks like being triggered by upgrade from 2025.2.1 to 2025.3.2. First log entries with the problem appeared right after new version of the chart was applied.

Pods with Linkerd-proxy container from previous version i.e. cr.l5d.io/linkerd/proxy:edge-25.2.1 are having the problem, when removed and recreated with cr.l5d.io/linkerd/proxy:edge-25.3.2 there are no duplicated metrics.

Logs, error output, etc

Prometheus logs:

prometheus time=2025-03-18T13:30:12.599Z level=WARN source=scrape.go:1884 msg="Error on ingesting samples with different value but same timestamp" component="scrape manager" scrape_pool=podMonitor/linkerd/linkerd-proxy/0 target=http://10.11.71.179:4191/metrics num_dropped=1

Metrics endpoint:

inbound_http_authz_allow_total{target_addr="10.11.85.144:3055",target_ip="10.11.85.144",target_port="3055",srv_group="",srv_kind="default",srv_name="all-unauthenticated",route_group="",route_kind="default",route_name="default",authz_group="",authz_kind="default",authz_name="all-unauthenticated",tls="true",client_id="ingress-nginx.ingress-nginx-public.serviceaccount.identity.linkerd.cluster.local"} 5060
inbound_http_authz_allow_total{target_addr="10.11.85.144:3055",target_ip="10.11.85.144",target_port="3055",srv_group="",srv_kind="default",srv_name="all-unauthenticated",route_group="",route_kind="default",route_name="default",authz_group="",authz_kind="default",authz_name="all-unauthenticated",tls="true",client_id="ingress-nginx.ingress-nginx-public.serviceaccount.identity.linkerd.cluster.local"} 13459

output of `linkerd check -o short`

n/a

Environment

Kubernetes Version: v1.32.0-eks-2e66e76 Cluster Environment: AWS Host OS: Bottlerocket OS 1.34.0 (aws-k8s-1.32) Linkerd version: edge 2025.3.2

Possible solution

As a workaround if related to upgrade - restart affected pod, so they are injected with newer proxy.

Additional context

No response

Would you like to work on fixing this bug?

None

Mar 18 '25 14:03 mrtworo

Do you happen to have the proxy_build_info metric for this pod?

Mar 19 '25 01:03 olix0r

@olix0r sure, are you looking for version? It was 2.280.0 for affected pods. Additionally I wanted to underline that it seems to be just a side effect of the upgrade that is trivial to resolve, however as I didn't find anything on that in release notes I thought it would be prudent to report such unexpected behaviour just in case there is something more to it :)

Mar 19 '25 08:03 mrtworo

hi there @mrtworo, thank you for filing this issue.

i've tried to reproduce this issue using this proxy version, but when i curl the proxy's metrics endpoint i do not see this metric duplicated:

; curl localhost:4191/metrics | grep inbound_http_authz_allow_total > inbound_http_authz_allow_total.txt
; wc -l inbound_http_authz_allow_total.txt
5 inbound_http_authz_allow_total.txt
; uniq inbound_http_authz_allow_total.txt | wc -l
5

i am relieved to hear that this was trivial to resolve, and appreciate you taking the time to file this bug report.

if i can ask, were these errors recurring consistently after upgrading, or were they specific to the time frame when you upgraded from 2025.2.1 to 2025.3.2?

Mar 19 '25 20:03 cratelyn

@cratelyn happy to help, errors in prometheus logs started right after resources generated via new chart were applied in our cluster:

pods present during chart upgrade, injected with 2025.2.1 and not restarted, began to expose the duplicated metrics at the time and were consistently doing so until we noticed couple hours after and deleted them, so they were injected with 2025.3.2
pods that were restarted due to other activities and injected with new proxy were fine

Mar 20 '25 10:03 mrtworo

Reporting the same thing here. Except we went from "2025.2.3" to "2025.3.4"

time=2025-04-04T11:16:23.018Z level=WARN source=scrape.go:1884 msg="Error on ingesting samples with different value but same timestamp" component="scrape manager" scrape_pool=podMonitor/linkerd/linkerd-proxy/0 ta
rget=http://10.16.3.218:4191/metrics num_dropped=1
time=2025-04-04T11:16:33.018Z level=WARN source=scrape.go:1884 msg="Error on ingesting samples with different value but same timestamp" component="scrape manager" scrape_pool=podMonitor/linkerd/linkerd-proxy/0 ta
rget=http://10.16.3.218:4191/metrics num_dropped=1
time=2025-04-04T11:16:43.017Z level=WARN source=scrape.go:1884 msg="Error on ingesting samples with different value but same timestamp" component="scrape manager" scrape_pool=podMonitor/linkerd/linkerd-proxy/0 ta
rget=http://10.16.3.218:4191/metrics num_dropped=1
time=2025-04-04T11:16:53.018Z level=WARN source=scrape.go:1884 msg="Error on ingesting samples with different value but same timestamp" component="scrape manager" scrape_pool=podMonitor/linkerd/linkerd-proxy/0 ta
rget=http://10.16.3.218:4191/metrics num_dropped=1

Apr 04 '25 11:04 jseiser

Seeing this after upgrading to 2025.4.4 from 2024.11.8.

ts=2025-06-26T04:29:05.868Z caller=scrape.go:1820 level=warn component="scrape manager" scrape_pool=linkerd-proxy target=http://10.1.3.151:4191/metrics msg="Error on ingesting samples with different value but same timestamp" num_dropped=1

As suggested, re-launching the pods so they get injected with the new proxy version seems to fix it.

Jun 26 '25 04:06 cmartell-at-m42

after further investigation, i was able to reproduce this, and have identified a fix. more to come soon, i will follow up when a pull request is in review, and when an edge release including a fix is released. thank you all!

Jun 30 '25 14:06 cratelyn

this is fixed in linkerd/linkerd2-proxy#3987! this should be included in an edge release this afternoon.

Jul 02 '25 16:07 cratelyn

this is fixed in https://github.com/linkerd/linkerd2/releases/tag/edge-25.7.1.

it's worth pointing out that because of the edge release paradigm that Linkerd follows, this issue will persist when upgrading from versions prior to 2025.3.2.

this issue did spot, as outlined in linkerd/linkerd2-proxy#3987, two issues with our metric labeling however. that fix will ensure that these duplciate metrics are not encountered again in the future. 🙂

Jul 02 '25 18:07 cratelyn

@cratelyn when you say "this issue will persist when upgrading from versions prior to 2025.3.2.", is there anything an end user need concern themselves with when upgrading to ensure they can circumvent this issue? Or should it be taken care of automatically upon edge upgrade to 2025.7.1 or later?

Oct 21 '25 19:10 alekhrycaiko

Duplicate metric on linkerd-proxy /metrics endpoint at linkerd injected pods after upgrade from 2025.2.1 to 2025.3.2.

What is the issue?

How can it be reproduced?

Logs, error output, etc

output of linkerd check -o short

Environment

Possible solution

Additional context

Would you like to work on fixing this bug?

output of `linkerd check -o short`