linkerd2
                                
                                 linkerd2 copied to clipboard
                                
                                    linkerd2 copied to clipboard
                            
                            
                            
                        viz: use the same proxy admin policies in `install` and `allow-scrapes`
Currently, the linkerd viz install command generates policy resources
to authorize scrapes in the linkerd-viz namespace. However, these
resources are different than the ones generated by the linkerd viz allow-scrapes command.
When the cluster (or the linkerd-viz namespace in particular) are
configured with config.linkerd.io/default-inbound-policy: deny,
linkerd viz check --proxy will currently emit a false positive for the
warning that checks if Prometheus is authorized to scrape data plane
pods, for the pods in the linkerd-viz namespace. This is because the
policies authorizing scrapes of those pods have different names from the
ones generated by allow-scrapes, so the check does not notice that
scrapes are actually allowed.
This branch fixes this issue by replacing the proxy admin policy
currently generated by the Helm chart for linkerd-viz
(templates/proxy-admin-policy.yaml) with a template for the
allow-scrapes policy. The allow-scrapes command was changed to
template only this chart in the desired target namespace. This way,
identical policies are generated by viz install and viz allow-scrapes,
fixing the incorrect warning in viz check --proxy.
In addition, there are some minor side benefits of this change:
- 
The old policies for the proxy admin server in the linkerd-viznamespace were somewhat more permissive, as they did not includeHTTPRoutes, and instead allowed complete access to all routes on the admin port: https://github.com/linkerd/linkerd2/blob/main/viz/charts/linkerd-viz/templates/proxy-admin-policy.yamlThe new policies have separate HTTPRoutes for probes and for/metrics, and only authorize any IP to access the/liveand/readyroutes. Access to the/metricsroute is authorized for Prometheus only, and no authorizations are permitted for other proxy admin routes in thelinkerd-viznamespace out of the box: https://github.com/linkerd/linkerd2/blob/eliza/helm-allow-scrapes/viz/charts/linkerd-viz/templates/allow-scrapes-policy.yaml
- 
We now have a single source of truth for these policies that is used by both linkerd viz installandlinkerd viz allow-scrapes. This means that if we make future changes to the policy, we don't need to modify it in multiple places.
I've manually verified that the check output is correct on this branch, and that the new install still works on a default-deny cluster.
Okay, so one update on this is that it turns out the reason the integration tests were failing until commit 85e8a4fdb7daed5f1df1f60189637b732b03dbec is that the new policies weren't allowing Prometheus to scrape its own proxies (and the integration tests specifically test for the presence of Prometheus' outbound metrics to the linkerd ns).
This is because the connections initiated by Prometheus to its own proxy aren't mTLS (since the proxy cannot initiate mTLS to...itself), so they're not authenticated by the MeshTLSAuthentication policy that authenticates connections with the prometheus ServiceAccount.
Unfortunately, there isn't currently a way to authenticate connections only from a pod to itself (but, I've opened #9316 for adding this). This means that for the linkerd-viz namespace, we now have to create a policy that allows anyone to access proxy /metrics endpoints. This isn't great, since it would be ideal to only allow Prometheus to scrape the viz proxies. However, this is still somewhat better than the status quo, where all routes on the proxy admin ports in linkerd-viz are allowed for everyone --- this way, we are at least only making /metrics globally accessible, rather than all the admin routes.
The allow-scrapes command still creates policies that only allow Prometheus' ServiceAccount to scrape /metrics, since we don't have this problem in other namespaces. I did this by splitting the chart up so that the authentication policy is in a separate file, one of which is used by install and the other of which is used by allow-scrapes.
I'm a little worried that this feature (not the implementation) will create friction when other clients need to access the proxy, especially if this is used in non-default-deny clusters. We also should think through how this should work for external-prometheus setups. I'm marking this as a stable-2.13 feature so we have some more time to think about this. Please don't merge this until we've branched main for 2.12 :)
Closing for now until we have more time for holistic design.
kubectl logs --all-containers  -n linkerd-viz -l linkerd.io/control-plane-ns=linkerd --max-log-requests=15  -f
[1892678.730191s]  INFO ThreadId(01) inbound:server{port=9090}: linkerd_app_inbound::policy::tcp: Connection denied server.group=policy.linkerd.io server.kind=server server.name=prometheus-admin tls=Some(Established { client_id: Some(ClientId(Name("metrics-api.linkerd-viz.serviceaccount.identity.linkerd.cluster.local"))), negotiated_protocol: None }) client=10.2.76.70:52018
[1892678.730242s]  INFO ThreadId(01) inbound: linkerd_app_core::serve: Connection closed error=unauthorized connection on server/prometheus-admin client.addr=10.2.76.70:52018
[1892678.731145s]  INFO ThreadId(01) inbound:server{port=9090}: linkerd_app_inbound::policy::tcp: Connection denied server.group=policy.linkerd.io server.kind=server server.name=prometheus-admin tls=Some(Established { client_id: Some(ClientId(Name("metrics-api.linkerd-viz.serviceaccount.identity.linkerd.cluster.local"))), negotiated_protocol: None }) client=10.2.76.70:52034
[1892678.731215s]  INFO ThreadId(01) inbound: linkerd_app_core::serve: Connection closed error=unauthorized connection on server/prometheus-admin client.addr=10.2.76.70:52034
[1892678.769115s]  INFO ThreadId(01) inbound:server{port=9090}: linkerd_app_inbound::policy::tcp: Connection denied server.group=policy.linkerd.io server.kind=server server.name=prometheus-admin tls=Some(Established { client_id: Some(ClientId(Name("metrics-api.linkerd-viz.serviceaccount.identity.linkerd.cluster.local"))), negotiated_protocol: None }) client=10.2.76.70:52042
[1892678.769306s]  INFO ThreadId(01) inbound: linkerd_app_core::serve: Connection closed error=unauthorized connection on server/prometheus-admin client.addr=10.2.76.70:52042
[1892678.770592s]  INFO ThreadId(01) inbound:server{port=9090}: linkerd_app_inbound::policy::tcp: Connection denied server.group=policy.linkerd.io server.kind=server server.name=prometheus-admin tls=Some(Established { client_id: Some(ClientId(Name("metrics-api.linkerd-viz.serviceaccount.identity.linkerd.cluster.local"))), negotiated_protocol: None }) client=10.2.76.70:52056
[1892678.770651s]  INFO ThreadId(01) inbound: linkerd_app_core::serve: Connection closed error=unauthorized connection on server/prometheus-admin client.addr=10.2.76.70:52056
[1892678.771077s]  INFO ThreadId(01) inbound:server{port=9090}: linkerd_app_inbound::policy::tcp: Connection denied server.group=policy.linkerd.io server.kind=server server.name=prometheus-admin tls=Some(Established { client_id: Some(ClientId(Name("metrics-api.linkerd-viz.serviceaccount.identity.linkerd.cluster.local"))), negotiated_protocol: None }) client=10.2.76.70:52062
[1892678.771126s]  INFO ThreadId(01) inbound: linkerd_app_core::serve: Connection closed error=unauthorized connection on server/prometheus-admin client.addr=10.2.76.70:52062
[1892678.771990s]  INFO ThreadId(01) inbound:server{port=9090}: linkerd_app_inbound::policy::tcp: Connection denied server.group=policy.linkerd.io server.kind=server server.name=prometheus-admin tls=Some(Established { client_id: Some(ClientId(Name("metrics-api.linkerd-viz.serviceaccount.identity.linkerd.cluster.local"))), negotiated_protocol: None }) client=10.2.76.70:52088
[1892678.772040s]  INFO ThreadId(01) inbound: linkerd_app_core::serve: Connection closed error=unauthorized connection on server/prometheus-admin client.addr=10.2.76.70:52088
[1892678.772725s]  INFO ThreadId(01) inbound:server{port=9090}: linkerd_app_inbound::policy::tcp: Connection denied server.group=policy.linkerd.io server.kind=server server.name=prometheus-admin tls=Some(Established { client_id: Some(ClientId(Name("metrics-api.linkerd-viz.serviceaccount.identity.linkerd.cluster.local"))), negotiated_protocol: None }) client=10.2.76.70:52044
[1892678.772774s]  INFO ThreadId(01) inbound: linkerd_app_core::serve: Connection closed error=unauthorized connection on server/prometheus-admin client.addr=10.2.76.70:52044
[1892678.773289s]  INFO ThreadId(01) inbound:server{port=9090}: linkerd_app_inbound::policy::tcp: Connection denied server.group=policy.linkerd.io server.kind=server server.name=prometheus-admin tls=Some(Established { client_id: Some(ClientId(Name("metrics-api.linkerd-viz.serviceaccount.identity.linkerd.cluster.local"))), negotiated_protocol: None }) client=10.2.76.70:52076
[1892678.773336s]  INFO ThreadId(01) inbound: linkerd_app_core::serve: Connection closed error=unauthorized connection on server/prometheus-admin client.addr=10.2.76.70:52076
[1892678.773839s]  INFO ThreadId(01) inbound:server{port=9090}: linkerd_app_inbound::policy::tcp: Connection denied server.group=policy.linkerd.io server.kind=server server.name=prometheus-admin tls=Some(Established { client_id: Some(ClientId(Name("metrics-api.linkerd-viz.serviceaccount.identity.linkerd.cluster.local"))), negotiated_protocol: None }) client=10.2.76.70:52106
[1892678.773892s]  INFO ThreadId(01) inbound: linkerd_app_core::serve: Connection closed error=unauthorized connection on server/prometheus-admin client.addr=10.2.76.70:52106
[1892678.774386s]  INFO ThreadId(01) inbound:server{port=9090}: linkerd_app_inbound::policy::tcp: Connection denied server.group=policy.linkerd.io server.kind=server server.name=prometheus-admin tls=Some(Established { client_id: Some(ClientId(Name("metrics-api.linkerd-viz.serviceaccount.identity.linkerd.cluster.local"))), negotiated_protocol: None }) client=10.2.76.70:52096
[1892678.774437s]  INFO ThreadId(01) inbound: linkerd_app_core::serve: Connection closed error=unauthorized connection on server/prometheus-admin client.addr=10.2.76.70:52096
[1892678.776921s]  INFO ThreadId(01) inbound:server{port=9090}: linkerd_app_inbound::policy::tcp: Connection denied server.group=policy.linkerd.io server.kind=server server.name=prometheus-admin tls=Some(Established { client_id: Some(ClientId(Name("metrics-api.linkerd-viz.serviceaccount.identity.linkerd.cluster.local"))), negotiated_protocol: None }) client=10.2.76.70:52118
[1892678.776962s]  INFO ThreadId(01) inbound: linkerd_app_core::serve: Connection closed error=unauthorized connection on server/prometheus-admin client.addr=10.2.76.70:52118
[1892678.777768s]  INFO ThreadId(01) inbound:server{port=9090}: linkerd_app_inbound::policy::tcp: Connection denied server.group=policy.linkerd.io server.kind=server server.name=prometheus-admin tls=Some(Established { client_id: Some(ClientId(Name("metrics-api.linkerd-viz.serviceaccount.identity.linkerd.cluster.local"))), negotiated_protocol: None }) client=10.2.76.70:52110
[1892678.777805s]  INFO ThreadId(01) inbound: linkerd_app_core::serve: Connection closed error=unauthorized connection on server/prometheus-admin client.addr=10.2.76.70:52110
[1892678.779248s]  INFO ThreadId(01) inbound:server{port=9090}: linkerd_app_inbound::policy::tcp: Connection denied server.group=policy.linkerd.io server.kind=server server.name=prometheus-admin tls=Some(Established { client_id: Some(ClientId(Name("metrics-api.linkerd-viz.serviceaccount.identity.linkerd.cluster.local"))), negotiated_protocol: None }) client=10.2.76.70:52128
[1892678.779315s]  INFO ThreadId(01) inbound: linkerd_app_core::serve: Connection closed error=unauthorized connection on server/prometheus-admin client.addr=10.2.76.70:52128
[1892678.779837s]  INFO ThreadId(01) inbound:server{port=9090}: linkerd_app_inbound::policy::tcp: Connection denied server.group=policy.linkerd.io server.kind=server server.name=prometheus-admin tls=Some(Established { client_id: Some(ClientId(Name("metrics-api.linkerd-viz.serviceaccount.identity.linkerd.cluster.local"))), negotiated_protocol: None }) client=10.2.76.70:52134
[1892678.779886s]  INFO ThreadId(01) inbound: linkerd_app_core::serve: Connection closed error=unauthorized connection on server/prometheus-admin client.addr=10.2.76.70:52134
[1892678.780387s]  INFO ThreadId(01) inbound:server{port=9090}: linkerd_app_inbound::policy::tcp: Connection denied server.group=policy.linkerd.io server.kind=server server.name=prometheus-admin tls=Some(Established { client_id: Some(ClientId(Name("metrics-api.linkerd-viz.serviceaccount.identity.linkerd.cluster.local"))), negotiated_protocol: None }) client=10.2.76.70:52144
[1892678.780429s]  INFO ThreadId(01) inbound: linkerd_app_core::serve: Connection closed error=unauthorized connection on server/prometheus-admin client.addr=10.2.76.70:52144
[1892678.781369s]  INFO ThreadId(01) inbound:server{port=9090}: linkerd_app_inbound::policy::tcp: Connection denied server.group=policy.linkerd.io server.kind=server server.name=prometheus-admin tls=Some(Established { client_id: Some(ClientId(Name("metrics-api.linkerd-viz.serviceaccount.identity.linkerd.cluster.local"))), negotiated_protocol: None }) client=10.2.76.70:52148
[1892678.781416s]  INFO ThreadId(01) inbound: linkerd_app_core::serve: Connection closed error=unauthorized connection on server/prometheus-admin client.addr=10.2.76.70:52148
[1892686.087146s]  INFO ThreadId(01) inbound:server{port=9090}: linkerd_app_inbound::policy::tcp: Connection denied server.group=policy.linkerd.io server.kind=server server.name=prometheus-admin tls=Some(Established { client_id: Some(ClientId(Name("metrics-api.linkerd-viz.serviceaccount.identity.linkerd.cluster.local"))), negotiated_protocol: None }) client=10.2.76.70:57228
[1892686.087225s]  INFO ThreadId(01) inbound: linkerd_app_core::serve: Connection closed error=unauthorized connection on server/prometheus-admin client.addr=10.2.76.70:57228