Service Profile does not get detected on emissary ingress deployment
What is the issue?
We are running Linkerd 2.14 on our AWS EKS cluster. We are trying to get per-route metrics using ServiceProfiles working on Emissary api-gateway which is deployed just behind our AWS ALB ingress, but the ServiceProfile does not show any per-route metrics.
How can it be reproduced?
Deploy emissary api-gateway deployment, add it as a service behind Ingress for hosts. Deploy ServiceProfile for the service and send requests to the endpoints.
Logs, error output, etc
As per the docs for ServiceProfiles
The destination service for a request is computed by selecting the value of the first header to exist of, l5d-dst-override, :authority, and Host
I interpret this as whichever header linkerd sees first is used to interpret the ServiceProfile. Please correct me if I am wrong.
Since I can't attach a debug container to an emissary pod, here is the output for another podinfo pod with the linkerd-debug sidecar attached:
Hypertext Transfer Protocol
GET /healthz HTTP/1.1\r\n
[Expert Info (Chat/Sequence): GET /healthz HTTP/1.1\r\n]
[GET /healthz HTTP/1.1\r\n]
[Severity level: Chat]
[Group: Sequence]
Request Method: GET
Request URI: /healthz
Request Version: HTTP/1.1
host: api.client.com\r\n
x-forwarded-proto: https\r\n
x-forwarded-port: 443\r\n
user-agent: curl/8.7.1\r\n
accept: */*\r\n
x-envoy-expected-rq-timeout-ms: 30000\r\n
l5d-dst-override: podinfo-svc.podinfo.svc.cluster.local:9898\r\n
x-envoy-original-path: /healthz\r\n
l5d-dst-canonical: podinfo-svc.podinfo.svc.cluster.local:9898\r\n
l5d-client-id: pg-gateway.pg-gateway.serviceaccount.identity.linkerd.cluster.local\r\n
\r\n
[Full request URI: http://api.client.com/healthz]
[HTTP request 2/2]
[Prev request in frame: 32]
Seems like since host header is the first to appear, Linkerd picks it up and it does not match the name of the ServiceProfile used for Podinfo.
But if I use Mappings to rewrite the host for the mapping used by Podinfo, I can see ServiceProfile being detected correctly.
Hypertext Transfer Protocol
GET /healthz HTTP/1.1\r\n
[Expert Info (Chat/Sequence): GET /healthz HTTP/1.1\r\n]
[GET /healthz HTTP/1.1\r\n]
[Severity level: Chat]
[Group: Sequence]
Request Method: GET
Request URI: /healthz
Request Version: HTTP/1.1
host: podinfo-svc.podinfo.svc.cluster.local:9898\r\n
x-forwarded-proto: https\r\n
x-forwarded-port: 443\r\n
user-agent: curl/8.7.1\r\n
accept: */*\r\n
x-envoy-expected-rq-timeout-ms: 30000\r\n
x-idfy-gateway-id: pg-gateway\r\n
l5d-dst-override: podinfo-svc.podinfo.svc.cluster.local:9898\r\n
x-envoy-original-path: /healthz\r\n
l5d-dst-canonical: podinfo-svc.podinfo.svc.cluster.local:9898\r\n
l5d-client-id: pg-gateway.pg-gateway.serviceaccount.identity.linkerd.cluster.local\r\n
\r\n
[Full request URI: http://podinfo-svc.podinfo.svc.cluster.local:9898/healthz]
[HTTP request 1/1]
So l5d-dst-override is not getting read.
Here is the mapping in case required:
apiVersion: getambassador.io/v3alpha1
kind: Mapping
metadata:
annotations:
labels:
argocd.argoproj.io/instance: scorpius-pg-gateway-mapping
name: podinfo-health-check
namespace: pg-gateway
spec:
ambassador_id:
- pg-api-gateway
bypass_auth: true
host_rewrite: 'podinfo-svc.podinfo.svc.cluster.local:9898'
hostname: '*'
prefix: /healthz
rewrite: /healthz
service: 'podinfo-svc.podinfo.svc.cluster.local:9898'
timeout_ms: 30000
This works on podinfo because I can use the mapping construct but cannot do the same on Emissary ingress pods because Ingress is directly forwarding traffic to emissary and I cannot use any mapping for rewriting Host on emissary.
output of linkerd check -o short
linkerd-version
---------------
‼ cli is up-to-date
unsupported version channel: stable-2.14.0
see https://linkerd.io/2.14/checks/#l5d-version-cli for hints
control-plane-version
---------------------
‼ control plane is up-to-date
unsupported version channel: stable-2.14.0
see https://linkerd.io/2.14/checks/#l5d-version-control for hints
linkerd-control-plane-proxy
---------------------------
‼ control plane proxies are up-to-date
some proxies are not running the current version:
* linkerd-destination-6f6cbbf6c9-wtvvq (stable-2.14.0)
* linkerd-identity-66dfc67478-7xdxx (stable-2.14.0)
* linkerd-proxy-injector-67d54d5c78-7xvm4 (stable-2.14.0)
see https://linkerd.io/2.14/checks/#l5d-cp-proxy-version for hints
linkerd-viz
-----------
‼ viz extension proxies are up-to-date
some proxies are not running the current version:
* metrics-api-5c4f49c9cf-kjkcf (stable-2.14.0)
* tap-64975d56bc-7fzp7 (stable-2.14.0)
* tap-injector-6bd696c58b-ld48v (stable-2.14.0)
* web-556b79cddd-4v494 (stable-2.14.0)
see https://linkerd.io/2.14/checks/#l5d-viz-proxy-cp-version for hints
Status check results are √
Environment
- Kubernetes version: 1.31
- Env: EKS
- Linkerd version: 2.14.0
Possible solution
No response
Additional context
No response
Would you like to work on fixing this bug?
None
Hey @ghostx31 – first things first, is your Emissary meshed? It must be to use Linkerd features (and if it is, then you should indeed be able to inject the Linkerd debug container into your Emissary pod).
I believe that you'll also need to mesh Emissary in ingress mode, and you will not want to mark Emissary's incoming port as a skip port. This is not what we usually recommend for meshing Emissary! However, our usual recommendations assume that Emissary is your point of ingress into the cluster, and in your case it's not: with the ALB in front of Emissary, you need to do things a bit differently.
Emissary pod is indeed meshed. But I'm unable to get the debug container running on it since emissary does not run as root and the debug container fails during the installation step.
I haven't marked ports as skip ports, but I'll need to try to mesh emissary in ingress mode. Might take me a day or two but I'll get back to you once I try that out! Thanks for the prompt response!
Hello @kflynn I tried setting up emissary in ingress mode, which causes some authentication issues due to SSL errors. I was not able to debug this since other people required the env from testing. That is probably just me configuring something wrong.
But apart from that, I was not able to get service profile to work anyway. Is there anything I could be missing that could help me figure this issue out?
Thanks in advance!
(Are you on the Linkerd Slack? It might be a little easier to discuss this there.)
Let me back up a minute: where is TLS being terminated? What is the ALB supposed to be doing in your architecture, and what is the Emissary supposed to be doing? Have you tried enabling Emissary's debug logs so that you can see the details of exactly what Emissary is receiving from the ALB?
I think I am on the Linkerd slack, let me reach out to you there.
I work with them and have tried to upgrade linkerd edge version (24.11.8) and follows this tutorial and deployed the HTTPRoutes from the document but we are still not able to see route level metrics Then we redeployed the service profile from the document and we reached the same issue where we can see the route level metrics where traffic is internal but not when it is coming from emissary ingress.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.