linkerd2 icon indicating copy to clipboard operation
linkerd2 copied to clipboard

Prometheus metrics federation yields HTTP 403

Open adleong opened this issue 2 years ago • 14 comments

Discussed in https://github.com/linkerd/linkerd2/discussions/11044

Originally posted by ngc4579 June 21, 2023 Using the Prometheus federation API as advertised in the docs yields an HTTP 403 scrape error (server returned HTTP status 403 Forbidden). IIRC this used to work some time ago. Were there any (recent) changes that are possibly not reflected in the docs?

What might cause the described behaviour?

adleong avatar Jun 22 '23 22:06 adleong

Thanks for raising this, @ngc4579. It's possible that additional AuthorizationPolicies are needed for Prometheus federation. This will require some investigation.

adleong avatar Jun 22 '23 22:06 adleong

This policy was suggested by Michelle B on the Linkerd Slack (link will expire in 90 days):

apiVersion: policy.linkerd.io/v1alpha1
kind: AuthorizationPolicy
metadata:
  name: prometheus-admin-federate
  namespace: linkerd-viz
spec:
  targetRef:
    group: policy.linkerd.io
    kind: Server
    name: prometheus-admin
  requiredAuthenticationRefs:
    - group: policy.linkerd.io
      kind: NetworkAuthentication
      name: kubelet

wmorgan avatar Jun 22 '23 23:06 wmorgan

Thanks so much @adleong @wmorgan for your answers. The mentioned AuthorizationPolicy actually did help, federation works as expected now. If this policy is intentionally required, I guess this should be reflected in the docs. (Or else, if it already is, it seems I wasn't able to find it. :) )

ngc4579 avatar Jun 23 '23 04:06 ngc4579

We have setup the linkerd-viz with external prometheus and after the upgrade we are getting following errors

time="2023-06-26T12:34:55Z" level=error msg="queryProm failed with: Query failed: \"sum(increase(response_total{deployment=\\\"app-prod-http\\\", direction=\\\"outbound\\\", namespace=\\\"web\\\"}[1m])) by (dst_namespace, dst_deployment, classification, tls)\": Post \"https://external-endpoint/api/v1/query\": context canceled"

prajithp13 avatar Jun 26 '23 12:06 prajithp13

Anybody would like to submit a PR with this policy included? Should be pretty straight-forward.

@prajithp13 Did you apply the policy?

alpeb avatar Jun 29 '23 18:06 alpeb

@alpeb I'd like to pick this up, I'm learning Linkerd and service meshes in general, would also like to contribute to the project, this seems like a good issue to start with.

deepto98 avatar Jul 01 '23 01:07 deepto98

@deepto98 sounds great, please proceed!

alpeb avatar Jul 10 '23 13:07 alpeb

@deepto98 Are you working on this? If not, I will be willing to tackle this issue :)

alexandreliberato avatar Jul 18 '23 23:07 alexandreliberato

I'll pick this up this week

deepto98 avatar Jul 22 '23 23:07 deepto98

Did a PR for this issue ever get created?

jderieg avatar Aug 23 '23 20:08 jderieg

Hey is there any progress on this issue?

ioannatheo avatar Dec 15 '23 13:12 ioannatheo

@ioannatheo there is a workaround by adding that policy YAML pasted earlier above. A PR to add that by default would be welcome.

wmorgan avatar Dec 15 '23 20:12 wmorgan

I am actively working on this. I think I have a pretty good understanding on what needs to be done. Track progress: https://github.com/francRang/linkerd2 Give me 1-2 days max and I should be able to get it ready for review.

francRang avatar Mar 04 '24 05:03 francRang