linkerd2 icon indicating copy to clipboard operation
linkerd2 copied to clipboard

An error occurred while integrating external OpenTelemetry into linkerd2.

Open wxiangliang opened this issue 6 months ago • 1 comments

What is the issue?

Hey, The otel-jaeger-collector doesn't have linkerd.io/inject enabled — it's set to linkerd.io/inject: disabled.I have added linkerd.io/inject: enabled and config linkerd.io/opaque-ports: "4317,14250" to my application. The application is able to send data to OpenTelemetry, but linkerd2-proxy throws an error.

How can it be reproduced?

jaeger: enabled: false collector: enabled: false webhook: collectorSvcAddr: "otel-jaeger-collector.observability.svc.cluster.local:4317" collectorSvcAccount: "otel-jaeger-collector"

Logs, error output, etc

NFO ThreadId(01) linkerd2_proxy: release 2.298.0 (83d4eac) by linkerd on 2025-05-21T04:00:41Z [ 0.010021s] INFO ThreadId(01) linkerd2_proxy::rt: Using single-threaded proxy runtime [ 0.013849s] INFO ThreadId(01) linkerd2_proxy: Admin interface on 0.0.0.0:4191 [ 0.013877s] INFO ThreadId(01) linkerd2_proxy: Inbound interface on 0.0.0.0:4143 [ 0.013881s] INFO ThreadId(01) linkerd2_proxy: Outbound interface on 127.0.0.1:4140 [ 0.013884s] INFO ThreadId(01) linkerd2_proxy: Tap interface on 0.0.0.0:4190 [ 0.013887s] INFO ThreadId(01) linkerd2_proxy: SNI is default.dev.serviceaccount.identity.linkerd.cluster.local [ 0.013891s] INFO ThreadId(01) linkerd2_proxy: Local identity is default.dev.serviceaccount.identity.linkerd.cluster.local [ 0.013909s] INFO ThreadId(01) linkerd2_proxy: Destinations resolved via linkerd-dst-headless.linkerd.svc.cluster.local:8086 (linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local) [ 0.013916s] INFO ThreadId(01) linkerd2_proxy: Tracing collector at otel-jaeger-collector.observability.svc.cluster.local:4317 (otel-jaeger-collector.observability.serviceaccount.identity.linkerd.cluster.local) [ 0.016001s] INFO ThreadId(01) dst:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}: linkerd_pool_p2c: Adding endpoint addr=10.244.2.147:8086 [ 0.016083s] INFO ThreadId(02) identity:identity{server.addr=linkerd-identity-headless.linkerd.svc.cluster.local:8080}:controller{addr=linkerd-identity-headless.linkerd.svc.cluster.local:8080}: linkerd_pool_p2c: Adding endpoint addr=10.244.2.166:8080 [ 0.016982s] INFO ThreadId(01) policy:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}: linkerd_pool_p2c: Adding endpoint addr=10.244.2.147:8090 [ 0.023336s] INFO ThreadId(01) tracing:controller{addr=otel-jaeger-collector.observability.svc.cluster.local:4317}: linkerd_pool_p2c: Adding endpoint addr=10.103.35.26:4317 [ 0.024818s] WARN ThreadId(01) tracing:controller{addr=otel-jaeger-collector.observability.svc.cluster.local:4317}:endpoint{addr=10.103.35.26:4317}: linkerd_reconnect: Failed to connect error=endpoint 10.103.35.26:4317: received corrupt message of type InvalidContentType error.sources=[received corrupt message of type InvalidContentType] [ 0.033815s] INFO ThreadId(02) daemon:identity: linkerd_app: Certified identity id=default.dev.serviceaccount.identity.linkerd.cluster.local [ 0.130907s] WARN ThreadId(01) tracing:controller{addr=otel-jaeger-collector.observability.svc.cluster.local:4317}:endpoint{addr=10.103.35.26:4317}: linkerd_reconnect: Failed to connect error=endpoint 10.103.35.26:4317: received corrupt message of type InvalidContentType error.sources=[received corrupt message of type InvalidContentType] [ 0.351137s] WARN ThreadId(01) tracing:controller{addr=otel-jaeger-collector.observability.svc.cluster.local:4317}:endpoint{addr=10.103.35.26:4317}: linkerd_reconnect: Failed to connect error=endpoint 10.103.35.26:4317: received corrupt message of type InvalidContentType error.sources=[received corrupt message of type InvalidContentType] [ 0.383602s] INFO ThreadId(01) outbound:proxy{addr=10.98.57.115:3306}:balance{addr=mysql.default.svc.cluster.local:3306}: linkerd_pool_p2c: Adding endpoint addr=10.244.2.156:3306 [ 0.790422s] WARN ThreadId(01) tracing:controller{addr=otel-jaeger-collector.observability.svc.cluster.local:4317}:endpoint{addr=10.103.35.26:4317}: linkerd_reconnect: Failed to connect error=endpoint 10.103.35.26:4317: received corrupt message of type InvalidContentType error.sources=[received corrupt message of type InvalidContentType] [ 1.293117s] WARN ThreadId(01) tracing:controller{addr=otel-jaeger-collector.observability.svc.cluster.local:4317}:endpoint{addr=10.103.35.26:4317}: linkerd_reconnect: Failed to connect error=endpoint 10.103.35.26:4317: received corrupt message of type InvalidContentType error.sources=[received corrupt message of type InvalidContentType] [ 1.794508s] WARN ThreadId(01) tracing:controller{addr=otel-jaeger-collector.observability.svc.cluster.local:4317}:endpoint{addr=10.103.35.26:4317}: linkerd_reconnect: Failed to connect error=endpoint 10.103.35.26:4317: received corrupt message of type InvalidContentType error.sources=[received corrupt message of type InvalidContentType] [ 2.296931s] WARN ThreadId(01) tracing:controller{addr=otel-jaeger-collector.observability.svc.cluster.local:4317}:endpoint{addr=10.103.35.26:4317}: linkerd_reconnect: Failed to connect error=endpoint 10.103.35.26:4317: received corrupt message of type InvalidContentType error.sources=[received corrupt message of type InvalidContentType] [ 2.798144s] WARN ThreadId(01) tracing:controller{addr=otel-jaeger-collector.observability.svc.cluster.local:4317}:endpoint{addr=10.103.35.26:4317}: linkerd_reconnect: Failed to connect error=endpoint 10.103.35.26:4317: received corrupt message of type InvalidContentType error.sources=[received corrupt message of type InvalidContentType] [ 3.301213s] WARN ThreadId(01) tracing:controller{addr=otel-jaeger-collector.observability.svc.cluster.local:4317}:endpoint{addr=10.103.35.26:4317}: linkerd_reconnect: Failed to connect error=endpoint 10.103.35.26:4317: received corrupt message of type InvalidContentType error.sources=[received corrupt message of type InvalidContentType] [ 3.803097s] WARN ThreadId(01) tracing:controller{addr=otel-jaeger-collector.observability.svc.cluster.local:4317}:endpoint{addr=10.103.35.26:4317}: linkerd_reconnect: Failed to connect error=endpoint 10.103.35.26:4317: received corrupt message of type InvalidContentType error.sources=[received corrupt message of type InvalidContentType] [ 4.304870s] WARN ThreadId(01) tracing:controller{addr=otel-jaeger-collector.observability.svc.cluster.local:4317}:endpoint{addr=10.103.35.26:4317}: linkerd_reconnect: Failed to connect error=endpoint 10.103.35.26:4317: received corrupt message of type InvalidContentType error.sources=[received corrupt message of type InvalidContentType] [ 4.806364s] WARN ThreadId(01) tracing:controller{addr=otel-jaeger-collector.observability.svc.cluster.local:4317}:endpoint{addr=10.103.35.26:4317}: linkerd_reconnect: Failed to connect error=endpoint 10.103.35.26:4317: received corrupt message of type InvalidContentType error.sources=[received corrupt message of type InvalidContentType] [ 5.308914s] WARN ThreadId(01) tracing:controller{addr=otel-jaeger-collector.observability.svc.cluster.local:4317}:endpoint{addr=10.103.35.26:4317}: linkerd_reconnect: Failed to connect error=endpoint 10.103.35.26:4317: received corrupt message of type InvalidContentType error.sources=[received corrupt message of type InvalidContentType] [ 5.810609s] WARN ThreadId(01) tracing:controller{addr=otel-jaeger-collector.observability.svc.cluster.local:4317}:endpoint{addr=10.103.35.26:4317}: linkerd_reconnect: Failed to connect error=endpoint 10.103.35.26:4317: received corrupt message of type InvalidContentType error.sources=[received corrupt message of type InvalidContentType] [ 6.312280s] WARN ThreadId(01) tracing:controller{addr=otel-jaeger-collector.observability.svc.cluster.local:4317}:endpoint{addr=10.103.35.26:4317}: linkerd_reconnect: Failed to connect error=endpoint 10.103.35.26:4317: received corrupt message of type InvalidContentType error.sources=[received corrupt message of type InvalidContentType] [ 6.814769s] WARN ThreadId(01) tracing:controller{addr=otel-jaeger-collector.observability.svc.cluster.local:4317}:endpoint{addr=10.103.35.26:4317}: linkerd_reconnect: Failed to connect error=endpoint 10.103.35.26:4317: received corrupt message of type InvalidContentType error.sources=[received corrupt message of type InvalidContentType] [ 7.316512s] WARN ThreadId(01) tracing:controller{addr=otel-jaeger-collector.observability.svc.cluster.local:4317}:endpoint{addr=10.103.35.26:4317}: linkerd_reconnect: Failed to connect error=endpoint 10.103.35.26:4317: received corrupt message of type InvalidContentType error.sources=[received corrupt message of type InvalidContentType] [ 7.818182s] WARN ThreadId(01) tracing:controller{addr=otel-jaeger-collector.observability.svc.cluster.local:4317}:endpoint{addr=10.103.35.26:4317}: linkerd_reconnect: Failed to connect error=endpoint 10.103.35.26:4317: received corrupt message of type InvalidContentType error.sources=[received corrupt message of type InvalidContentType] [ 8.320974s] WARN ThreadId(01) tracing:controller{addr=otel-jaeger-collector.observability.svc.cluster.local:4317}:endpoint{addr=10.103.35.26:4317}: linkerd_reconnect: Failed to connect error=endpoint 10.103.35.26:4317: received corrupt message of type InvalidContentType error.sources=[received corrupt message of type InvalidContentType] [ 8.822630s] WARN ThreadId(01) tracing:controller{addr=otel-jaeger-collector.observability.svc.cluster.local:4317}:endpoint{addr=10.103.35.26:4317}: linkerd_reconnect: Failed to connect error=endpoint 10.103.35.26:4317: received corrupt message of type InvalidContentType error.sources=[received corrupt message of type InvalidContentType] [ 9.325652s] WARN ThreadId(01) tracing:controller{addr=otel-jaeger-collector.observability.svc.cluster.local:4317}:endpoint{addr=10.103.35.26:4317}: linkerd_reconnect: Failed to connect error=endpoint 10.103.35.26:4317: received corrupt message of type InvalidContentType error.sources=[received corrupt message of type InvalidContentType] [ 9.828053s] WARN ThreadId(01) tracing:controller{addr=otel-jaeger-collector.observability.svc.cluster.local:4317}:endpoint{addr=10.103.35.26:4317}: linkerd_reconnect: Failed to connect error=endpoint 10.103.35.26:4317: received corrupt message of type InvalidContentType error.sources=[received corrupt message of type InvalidContentType] [ 10.330472s] WARN ThreadId(01) tracing:controller{addr=otel-jaeger-collector.observability.svc.cluster.local:4317}:endpoint{addr=10.103.35.26:4317}: linkerd_reconnect: Failed to connect error=endpoint 10.103.35.26:4317: received corrupt message of type InvalidContentType error.sources=[received corrupt message of type InvalidContentType] [ 10.833466s] WARN ThreadId(01) tracing:controller{addr=otel-jaeger-collector.observability.svc.cluster.local:4317}:endpoint{addr=10.103.35.26:4317}: linkerd_reconnect: Failed to connect error=endpoint 10.103.35.26:4317: received corrupt message of type InvalidContentType error.sources=[received corrupt message of type InvalidContentType] [ 11.336577s] WARN ThreadId(01) tracing:controller{addr=otel-jaeger-collector.observability.svc.cluster.local:4317}:endpoint{addr=10.103.35.26:4317}: linkerd_reconnect: Failed to connect error=endpoint 10.103.35.26:4317: received corrupt message of type InvalidContentType error.sources=[received corrupt message of type InvalidContentType] [ 11.838947s] WARN ThreadId(01) tracing:controller{addr=otel-jaeger-collector.observability.svc.cluster.local:4317}:endpoint{addr=10.103.35.26:4317}: linkerd_reconnect: Failed to connect error=endpoint 10.103.35.26:4317: received corrupt message of type InvalidContentType error.sources=[received corrupt message of type InvalidContentType]

output of linkerd check -o short

linkerd-jaeger

√ linkerd-jaeger extension Namespace exists √ jaeger extension pods are injected √ jaeger injector pods are running √ jaeger extension proxies are healthy ‼ jaeger extension proxies are up-to-date some proxies are not running the current version: * jaeger-injector-757c9d455f-cqqn4 (edge-25.5.4) see https://linkerd.io/2/checks/#l5d-jaeger-proxy-cp-version for hints √ jaeger extension proxies and cli versions match

linkerd-viz

√ linkerd-viz Namespace exists √ can initialize the client √ linkerd-viz ClusterRoles exist √ linkerd-viz ClusterRoleBindings exist √ tap API server has valid cert √ tap API server cert is valid for at least 60 days √ tap API service is running √ linkerd-viz pods are injected √ viz extension pods are running √ viz extension proxies are healthy ‼ viz extension proxies are up-to-date some proxies are not running the current version: * metrics-api-7c986c6457-jb2pp (edge-25.5.4) * tap-549c5d497b-pxh5m (edge-25.5.4) * tap-injector-b5658dd67-kbr9h (edge-25.5.4) * web-67b866ddd9-fzz66 (edge-25.5.4) see https://linkerd.io/2/checks/#l5d-viz-proxy-cp-version for hints √ viz extension proxies and cli versions match √ viz extension self-check

linkerd-smi

√ linkerd-smi extension Namespace exists √ SMI extension service account exists √ SMI extension pods are injected √ SMI extension pods are running √ SMI extension proxies are healthy

Status check results are √

Environment

minikube

Possible solution

no

Additional context

No response

Would you like to work on fixing this bug?

None

wxiangliang avatar Jun 04 '25 08:06 wxiangliang

I've got the same issue. Another service (traefik) is also reporting to that port on our side. This is working as expected. But it's not the case for linkerd. It might be related to the point that it's not possible to choose between insecure or secure communication between otel and linkerd-proxy.

eBeyond avatar Jun 04 '25 12:06 eBeyond

Seeing the same error.

Our collector is not meshed and the helm chart states the proxy.tracing.collector.meshIdentity value should be left blank (see https://github.com/linkerd/linkerd2/blob/92560826782c1e4268f9f717a709ecc97d365112/charts/linkerd-control-plane/values.yaml#L295-L298 ). Unfortunately leaving it blank results in an error during start of the linkerd components.

Setting it to the collectors ServiceAccount then leads to the errors stated above.

patst avatar Jul 01 '25 10:07 patst

Same issue — on the Jaeger collector I get the following error:

{"level":"info","ts":1753191899.3153877,"caller":"[email protected]/server.go:983","msg":"[core][Server #3] grpc: Server.Serve failed to create ServerTransport: connection error: desc = \"transport: http2Server.HandleStreams received bogus greeting from client: \\\"\\\\x16\\\\x03\\\\x01\\\\x01\\\\v\\\\x01\\\\x00\\\\x01\\\\a\\\\x03\\\\x03;>Ei$\\\\xed\\\\x7f~\\\\xaf+\\\\\\\"(\\\\x14\\\"\"","system":"grpc","grpc_log":true}

I tried specifying the IP in collectorSvcAddr — same result. I tried changing the port for otel-grpc to 11211 — no effect. I also tried adding config.linkerd.io/skip-outbound-ports, skip-inbound-ports, and config.linkerd.io/opaque-ports in different combinations — none of it had any effect. The linkerd-proxy still tries to establish a TLS connection to the Jaeger collector.

deltasem avatar Jul 23 '25 07:07 deltasem

The logs here -- to confirm, those are from the proxy on the OTel collector, correct?

kflynn avatar Jul 24 '25 16:07 kflynn

@kflynn Yeah, this is a separately deployed OTel collector. The logs in linkerd-proxy are similar to the original post:

received corrupt message of type InvalidContentType error

deltasem avatar Jul 24 '25 16:07 deltasem

Thanks @deltasem! Just out of curiosity, what's the use case for leaving the collector unmeshed? 🤔

kflynn avatar Jul 24 '25 16:07 kflynn

@kflynn The collector was already deployed earlier and has been running reliably for a long time. Linkerd was introduced later. A lot of infrastructure components send telemetry to the unmeshed collector (some of them unmeshed as well), and I'm not sure what side effects meshing it might cause — so I’d rather not risk it. Also, I’m aware mTLS adds some CPU overhead (though maybe I’m overestimating it in this case).

deltasem avatar Jul 24 '25 19:07 deltasem

@deltasem Sorry for the delay here!

At the moment, Linkerd requires the collector to be meshed; the proxy just won't do anything other than mTLS to the collector. 😐 That could, in theory, be changed, so knowing more about the use cases is important.

In general, meshing the collector should be safe: assuming you haven't set up authorization to prevent it, non-meshed clients will still be able to talk to it, and you shouldn't see dramatic overhead from mTLS. But I appreciate that it's a change.

kflynn avatar Aug 06 '25 18:08 kflynn

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Nov 19 '25 04:11 stale[bot]