linkerd2 icon indicating copy to clipboard operation
linkerd2 copied to clipboard

linkerd-init: overwrite check for existing iptables rules

Open mattstam opened this issue 3 years ago • 4 comments

What is the issue?

When client linkerd-proxy inject'd pods are restarted periodically at scale, there seems to be a low probability chance that one of them cannot communicate with a server linkerd-proxy inject'd pods that are protected by Auth Policies. For instance, if you had 3 replicas in your deployment, only 1 of them may have this issue.

Server pods will show a Request denied / unauthorized connection on server server-api rejection but no indication as of why (relevant logs below). This is where server-api is the ServerAuthorization server that only allows connections from client-sa.client.serviceaccount.identity.linkerd.cluster.local. Skeptical that this has anything to do with auth, as the client linkerd-proxy has expected output using Local identity is client-sa.client.serviceaccount.identity.linkerd.cluster.local (logs below).

client pod that doesn't work had a linkerd-init container that skipped iptables setup.

If the bad client pod is deleted, the new one that comes up will be able to communicate properly. In all the cases I have seen, the new pod came up on a different node and did have iptables get setup normally. The iptables setup is the only discernible difference that I can see here.

How can it be reproduced?

Bad at reproducibility steps here. Have a fairly typical Authorization Policy with client pods trying to send requests to some server pods. The server pods are protected by Auth Policies that only accept connections from the serviceaccount that the client pods use.

Linkerd control plane pods don't show any signs of failure, linkerd-proxy on client pods show same on good and bad pods, linkerd-init looks different on bad pod because it is skipping iptables setup: level=info msg="skipping iptables setup: found 1 existing chains" (will post more logs below)

A guess is this reproduced when one of the client pods lands on a node that previously had a client pod with a linkerd-init setup these rules.

Logs, error output, etc

client linkerd-init looks like this on good pod:

level=info msg="iptables-save -t nat"
level=info msg="# Generated by iptables-save v1.8.7 on *** \n*nat\n:PREROUTING ACCEPT [0:0]\n:INPUT ACCEPT [0:0]\n:OUTPUT ACCEPT [0:0]\n:POSTROUTING ACCEPT [0:0]\nCOMMIT\n# Completed on *** \n"
level=info msg="iptables -t nat -N PROXY_INIT_REDIRECT -m comment --comment proxy-init/redirect-common-chain/1661306970"
level=info msg="iptables -t nat -A PROXY_INIT_REDIRECT -p tcp --match multiport --dports 4190,4191,4567,4568 -j RETURN -m comment --comment proxy-init/ignore-port-4190,4191,4567,4568/1661306970"
level=info msg="iptables -t nat -A PROXY_INIT_REDIRECT -p tcp -j REDIRECT --to-port 4143 -m comment --comment proxy-init/redirect-all-incoming-to-proxy-port/1661306970"
level=info msg="iptables -t nat -A PREROUTING -j PROXY_INIT_REDIRECT -m comment --comment proxy-init/install-proxy-init-prerouting/1661306970"
level=info msg="iptables -t nat -N PROXY_INIT_OUTPUT -m comment --comment proxy-init/redirect-common-chain/1661306970"
level=info msg="iptables -t nat -A PROXY_INIT_OUTPUT -m owner --uid-owner 2102 -j RETURN -m comment --comment proxy-init/ignore-proxy-user-id/1661306970"
level=info msg="iptables -t nat -A PROXY_INIT_OUTPUT -o lo -j RETURN -m comment --comment proxy-init/ignore-loopback/1661306970"
level=info msg="iptables -t nat -A PROXY_INIT_OUTPUT -p tcp --match multiport --dports 4567,4568 -j RETURN -m comment --comment proxy-init/ignore-port-4567,4568/1661306970"
level=info msg="iptables -t nat -A PROXY_INIT_OUTPUT -p tcp -j REDIRECT --to-port 4140 -m comment --comment proxy-init/redirect-all-outgoing-to-proxy-port/1661306970"
level=info msg="iptables -t nat -A OUTPUT -j PROXY_INIT_OUTPUT -m comment --comment proxy-init/install-proxy-init-output/1661306970"
level=info msg="iptables-save -t nat"
level=info msg="# Generated by iptables-save v1.8.7 on *** \n*nat\n:PREROUTING ACCEPT [0:0]\n:INPUT ACCEPT [0:0]\n:OUTPUT ACCEPT [0:0]\n:POSTROUTING ACCEPT [0:0]\n:PROXY_INIT_OUTPUT - [0:0]\n:PROXY_INIT_REDIRECT - [0:0]\n-A PREROUTING -m comment --comment \"proxy-init/install-proxy-init-prerouting/1661306970\" -j PROXY_INIT_REDIRECT\n-A OUTPUT -m comment --comment \"proxy-init/install-proxy-init-output/1661306970\" -j PROXY_INIT_OUTPUT\n-A PROXY_INIT_OUTPUT -m owner --uid-owner 2102 -m comment --comment \"proxy-init/ignore-proxy-user-id/1661306970\" -j RETURN\n-A PROXY_INIT_OUTPUT -o lo -m comment --comment \"proxy-init/ignore-loopback/1661306970\" -j RETURN\n-A PROXY_INIT_OUTPUT -p tcp -m multiport --dports 4567,4568 -m comment --comment \"proxy-init/ignore-port-4567,4568/1661306970\" -j RETURN\n-A PROXY_INIT_OUTPUT -p tcp -m comment --comment \"proxy-init/redirect-all-outgoing-to-proxy-port/1661306970\" -j REDIRECT --to-ports 4140\n-A PROXY_INIT_REDIRECT -p tcp -m multiport --dports 4190,4191,4567,4568 -m comment --comment \"proxy-init/ignore-port-4190,4191,4567,4568/1661306970\" -j RETURN\n-A PROXY_INIT_REDIRECT -p tcp -m comment --comment \"proxy-init/redirect-all-incoming-to-proxy-port/1661306970\" -j REDIRECT --to-ports 4143\nCOMMIT\n# Completed on *** \n"

client linkerd-init logs on bad pod:

level=info msg="iptables-save -t nat"
level=info msg="# Generated by iptables-save v1.8.7 on *** 202y2\n*nat\n:PREROUTING ACCEPT [0:0]\n:INPUT ACCEPT [0:0]\n:OUTPUT ACCEPT [0:0]\n:POSTROUTING ACCEPT [0:0]\n:PROXY_INIT_OUTPUT - [0:0]\n:PROXY_INIT_REDIRECT - [0:0]\n-A PREROUTING -m comment --comment \"proxy-init/install-proxy-init-prerouting/1661306640\" -j PROXY_INIT_REDIRECT\n-A PROXY_INIT_OUTPUT -m owner --uid-owner 2102 -m comment --comment \"proxy-init/ignore-proxy-user-id/1661306640\" -j RETURN\n-A PROXY_INIT_OUTPUT -o lo -m comment --comment \"proxy-init/ignore-loopback/1661306640\" -j RETURN\n-A PROXY_INIT_OUTPUT -p tcp -m multiport --dports 4567,4568 -m comment --comment \"proxy-init/ignore-port-4567,4568/1661306640\" -j RETURN\n-A PROXY_INIT_OUTPUT -p tcp -m comment --comment \"proxy-init/redirect-all-outgoing-to-proxy-port/1661306640\" -j REDIRECT --to-ports 4140\n-A PROXY_INIT_REDIRECT -p tcp -m multiport --dports 4190,4191,4567,4568 -m comment --comment \"proxy-init/ignore-port-4190,4191,4567,4568/1661306640\" -j RETURN\n-A PROXY_INIT_REDIRECT -p tcp -m comment --comment \"proxy-init/redirect-all-incoming-to-proxy-port/1661306640\" -j REDIRECT --to-ports 4143\nCOMMIT\n# Completed on *** 2022\n"
level=info msg="skipping iptables setup: found 1 existing chains"

client linkerd-proxy logs same on both good and bad pods:

{"level":"INFO","fields":{"message":"Using single-threaded proxy runtime"},"target":"linkerd2_proxy::rt","threadId":"ThreadId(1)"}
{"level":"INFO","fields":{"message":"Admin interface on 0.0.0.0:4191"},"target":"linkerd2_proxy","threadId":"ThreadId(1)"}
{"level":"INFO","fields":{"message":"Inbound interface on 0.0.0.0:4143"},"target":"linkerd2_proxy","threadId":"ThreadId(1)"}
{"level":"INFO","fields":{"message":"Outbound interface on 127.0.0.1:4140"},"target":"linkerd2_proxy","threadId":"ThreadId(1)"}
{"level":"INFO","fields":{"message":"Tap DISABLED"},"target":"linkerd2_proxy","threadId":"ThreadId(1)"}
{"level":"INFO","fields":{"message":"Local identity is client-sa.client.serviceaccount.identity.linkerd.cluster.local"},"target":"linkerd2_proxy","threadId":"ThreadId(1)"}
{"level":"INFO","fields":{"message":"Identity verified via linkerd-identity-headless.linkerd.svc.cluster.local:8080 (linkerd-identity.linkerd.serviceaccount.identity.linkerd.cluster.local)"},"target":"linkerd2_proxy","threadId":"ThreadId(1)"}
{"level":"INFO","fields":{"message":"Destinations resolved via linkerd-dst-headless.linkerd.svc.cluster.local:8086 (linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local)"},"target":"linkerd2_proxy","threadId":"ThreadId(1)"}
{"level":"INFO","fields":{"message":"Certified identity: client-sa.client.serviceaccount.identity.linkerd.cluster.local"},"target":"linkerd_app","spans":[{"name":"daemon"},{"name":"identity"}],"threadId":"ThreadId(2)"}

server linkerd-proxy logs:

"level":"DEBUG","fields":{"message":"Connection closed"},"target":"linkerd_app_core::serve","spans":[{"name":"inbound"},{"client.addr":"10.200.0.18:59070","name":"accept"}],"threadId":"ThreadId(1)"}
"level":"DEBUG","fields":{"message":"parsed 5 headers"},"target":"hyper::proto::h1::io","spans":[{"name":"inbound"},{"client.addr":"172.31.6.164:49924","name":"accept"},{"port":8080,"name":"server"}],"threadId":"ThreadId(1)"}
"level":"DEBUG","fields":{"message":"incoming body is content-length (66 bytes)"},"target":"hyper::proto::h1::conn","spans":[{"name":"inbound"},{"client.addr":"172.31.6.164:49924","name":"accept"},{"port":8080,"name":"server"}],"threadId":"ThreadId(1)"}
"level":"INFO","fields":{"message":"Request denied","server":"server-api","tls":"None(NoClientHello)","client":"172.31.6.164:49924"},"target":"linkerd_app_inbound::policy::authorize::http","spans":[{"name":"inbound"},{"client.addr":"172.31.6.164:49924","name":"accept"},{"port":8080,"name":"server"},{"v":"1.x","name":"http"}],"threadId":"ThreadId(1)"}
"level":"DEBUG","fields":{"message":"incoming body completed"},"target":"hyper::proto::h1::conn","spans":[{"name":"inbound"},{"client.addr":"172.31.6.164:49924","name":"accept"},{"port":8080,"name":"server"}],"threadId":"ThreadId(1)"}
"level":"INFO","fields":{"message":"Request failed","error":"unauthorized connection on server server-api"},"target":"linkerd_app_core::errors::respond","spans":[{"name":"inbound"},{"client.addr":"172.31.6.164:49924","name":"accept"},{"port":8080,"name":"server"},{"v":"1.x","name":"http"},{"client.addr":"172.31.6.164:49924","name":"rescue"}],"threadId":"ThreadId(1)"}
"level":"DEBUG","fields":{"message":"Handling error on HTTP connection","status":"403 Forbidden","version":"HTTP/1.1","close":"false"},"target":"linkerd_app_core::errors::respond","spans":[{"name":"inbound"},{"client.addr":"172.31.6.164:49924","name":"accept"},{"port":8080,"name":"server"},{"v":"1.x","name":"http"}],"threadId":"ThreadId(1)"}
"level":"DEBUG","fields":{"message":"flushed 82 bytes"},"target":"hyper::proto::h1::io","spans":[{"name":"inbound"},{"client.addr":"172.31.6.164:49924","name":"accept"},{"port":8080,"name":"server"}],"threadId":"ThreadId(1)"}

output of linkerd check -o short

(Some version incompatibles but nothing looks related)

Linkerd core checks
===================

kubernetes-version
------------------
× is running the minimum kubectl version
    kubectl is on version [1.15.10], but version [1.16.0] or more recent is required
    see https://linkerd.io/2.11/checks/#kubectl-version for hints

linkerd-version
---------------
‼ cli is up-to-date
    is running version 2.11.0 but the latest stable version is 2.12.0
    see https://linkerd.io/2.11/checks/#l5d-version-cli for hints

control-plane-version
---------------------
‼ control plane is up-to-date
    is running version 2.11.2 but the latest stable version is 2.12.0
    see https://linkerd.io/2.11/checks/#l5d-version-control for hints
‼ control plane and cli versions match
    control plane running stable-2.11.2 but cli running stable-2.11.0
    see https://linkerd.io/2.11/checks/#l5d-version-control for hints

linkerd-control-plane-proxy
---------------------------
‼ control plane proxies are up-to-date
    some proxies are not running the current version:
        * linkerd-destination-5dcf8c8ffc-2cmrg (2.11.2)
        * linkerd-destination-5dcf8c8ffc-klgz9 (2.11.2)
        * linkerd-destination-5dcf8c8ffc-npd6v (2.11.2)
        * linkerd-identity-85ffb786d6-n76jj (2.11.2)
        * linkerd-identity-85ffb786d6-vg6fz (2.11.2)
        * linkerd-identity-85ffb786d6-xs5kj (2.11.2)
        * linkerd-proxy-injector-6864bb999b-8p8bl (2.11.2)
        * linkerd-proxy-injector-6864bb999b-jj77w (2.11.2)
        * linkerd-proxy-injector-6864bb999b-lxb8l (2.11.2)
    see https://linkerd.io/2.11/checks/#l5d-cp-proxy-version for hints
‼ control plane proxies and cli versions match
    linkerd-destination-5dcf8c8ffc-2cmrg running 2.11.2 but cli running stable-2.11.0
    see https://linkerd.io/2.11/checks/#l5d-cp-proxy-cli-version for hints

Status check results are ×

Environment

  • Kubernetes Version: v1.20.15
  • Cluster Environment: Azure
  • Host OS: Ubuntu 18.04.6 LTS
  • Linkerd Version: 2.11.2
  • Kernel Version: 5.4.0-1086-azure
  • Container Runtime Version: docker://20.10.8

Possible solution

Aside from a full root cause + bug fix, I would like to see some boolean option via env var / annotation handling for PROXY_INIT_OVERWRITE to always overwrite existing IPTables chain. Curious if others would find this useful. It would just skip this check.

Would also like more context in logs as to why certain requests get rejected.

Additional context

What makes this particular difficult to debug is that:

  1. Reproducibility is very low.
  2. Server linkerd-proxy does not suggest why it is 403'ing the request (even with debug or trace proxy log levels).
  3. Client linkerd-proxy shows no sign of failure even when it's requests are getting rejected.
  4. Attempting to change client linkerd-proxy verbosity or adding debugging sidecar will create a new pod and the issue will likely be fixed.

The bad pod linkerd-init arg ports:

    Args:
      --incoming-proxy-port
      4143
      --outgoing-proxy-port
      4140
      --proxy-uid
      2102
      --inbound-ports-to-ignore
      4190,4191,4567,4568
      --outbound-ports-to-ignore
      4567,4568

Possibly related issue(s): https://github.com/linkerd/linkerd2/issues/5786

Would you like to work on fixing this bug?

maybe

mattstam avatar Aug 25 '22 18:08 mattstam

I'm not sure I understand the situation where the proxy-init container is running but finding existing proxy rules. #5786 could explain it, but do you think that is likely on your cluster?

Since you mention this has low reproducibility, it'd be helpful if you can change the proxy-init container's logs to debug (proxyInit.logLevel='debug'). That would at least let us see the old rules that it's finding on the Pod (debug log line). From there we could determine if these rules are getting unexpectedly changed, or if they are correct and this happens to be unrelated.

I know you mention this will require restarting the Pod which would likely fix the issue when it happens, but it's important to see what the found rules are so that we can determine if this is actually the issue.

Aside from that, is there anything else unique about your cluster or application being injected? Is anything running on the host network?

kleimkuhler avatar Aug 26 '22 20:08 kleimkuhler

Nothing unexpected about the cluster IMO, just kube-proxy running in host network.

These pods will get restarted periodically, so most likely what happens is that a client pod (with linkerd-init container) lands on a node that previously had a linkerd-init set up the rules. https://github.com/linkerd/linkerd2/issues/5786 sounds like this skipping is expected behavior, which is understandable assuming nothing needs to update.

The suspicion here is that an old, related chain exists but is not configured in the correct way and overwriting it would correct these issues. Does this scenario sound possible?

It's certainly a hunch, and it could be correlated with a different issue: all this might really signal is that something breaks when a new injected pod lands on a node that previously had one. And even that situation doesn't repro 100%.

I'd be open to pre-emptively running this debug mode on our clusters to better find a root cause if it is needed. However, given the large amount of clusters and length of time it would need to be kept in this state to reproduce, is the potential perf impact is something we need to worry about? Do we have data about what the expected differences running in debug produces?

mattstam avatar Aug 26 '22 21:08 mattstam

These pods will get restarted periodically, so most likely what happens is that a client pod (with linkerd-init container) lands on a node that previously had a linkerd-init set up the rules.

If the Pods are not running on the host network, then linkerd-init's iptables commands only affect the Pod; the host Node won't have its iptables rules changed. That's what is surprising about your issue. As a Pod spins up for the first time — even if it's on a Node that previously hosted a similar Pod — it shouldn't observe any previously added Linkerd iptables rules.

That's why I'd like the logs on debug so that we can see what rules it's finding and if somehow the linkerd-init container is getting restarted after it's already run. Are the rules we are finding incorrect somehow or being changed? I'm not totally sure right now, but it'd be good to rule that out.

I'd be open to pre-emptively running this debug mode on our clusters to better find a root cause if it is needed. However, given the large amount of clusters and length of time it would need to be kept in this state to reproduce, is the potential perf impact is something we need to worry about?

Nope. The proxyInit.logLevel affects only the linkerd-init container which produces a minimal amount of logs. Additionally, the container is only running once at Pod startup, so there shouldn't be continuous logs added. Infact, looking through the binary there are only 6 new log lines that would be added — none of which are revealing sensitive information.

kleimkuhler avatar Aug 26 '22 23:08 kleimkuhler

That's an interesting point about hostNetwork not even being enabled on this pod, it would be hard to imagine how pre-existing iptables chains could ever exist in this scenario.

I will bump the log verbosity for the linkerd-init container and will report back when this reproduces, thanks!

mattstam avatar Aug 29 '22 17:08 mattstam

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Nov 27 '22 18:11 stale[bot]

Not stale

mattstam avatar Nov 27 '22 20:11 mattstam

@kleimkuhler here is all the logs we can get out from the misbehaving init container:

time="2022-12-01T20:42:01Z" level=info msg="iptables-save -t nat"
time="2022-12-01T20:42:02Z" level=info msg="# Generated by iptables-save v1.8.7 on Thu Dec  1 20:42:01 2022\n*nat\n:PREROUTING ACCEPT [0:0]\n:INPUT ACCEPT [0:0]\n:OUTPUT ACCEPT [0:0]\n:POSTROUTING ACCEPT [0:0]\n:PROXY_INIT_OUTPUT - [0:0]\n:PROXY_INIT_REDIRECT - [0:0]\n-A PREROUTING -m comment --comment \"proxy-init/install-proxy-init-prerouting/1669927320\" -j PROXY_INIT_REDIRECT\n-A PROXY_INIT_OUTPUT -m owner --uid-owner 2102 -m comment --comment \"proxy-init/ignore-proxy-user-id/1669927320\" -j RETURN\n-A PROXY_INIT_OUTPUT -o lo -m comment --comment \"proxy-init/ignore-loopback/1669927320\" -j RETURN\n-A PROXY_INIT_OUTPUT -p tcp -m multiport --dports 4567,4568 -m comment --comment \"proxy-init/ignore-port-4567,4568/1669927320\" -j RETURN\n-A PROXY_INIT_OUTPUT -p tcp -m comment --comment \"proxy-init/redirect-all-outgoing-to-proxy-port/1669927320\" -j REDIRECT --to-ports 4140\n-A PROXY_INIT_REDIRECT -p tcp -m multiport --dports 4190,4191,4567,4568 -m comment --comment \"proxy-init/ignore-port-4190,4191,4567,4568/1669927320\" -j RETURN\n-A PROXY_INIT_REDIRECT -p tcp -m comment --comment \"proxy-init/redirect-all-incoming-to-proxy-port/1669927320\" -j REDIRECT --to-ports 4143\nCOMMIT\n# Completed on Thu Dec  1 20:42:02 2022\n"
time="2022-12-01T20:42:02Z" level=info msg="skipping iptables setup: found 1 existing chains"

This is on Linkerd v2.11.2. It still occurs very infrequently but consistently at scale.

At this point we would really be looking for a way to configure the init container such that it never skips iptables setup.

mattstam avatar Dec 05 '22 20:12 mattstam

Thanks for looping back to this. So I'm really interested in seeing the debug output for this log line. It should occur right before the log line that we currently see: skipping iptables setup: found 1 existing chains. Are you sure you've set the initContainer to log at debug?

I know it's odd that the misbehaving containers all seem to have this iptables rule issue, but I'm still not convinced that that is the exact issue here. Looking through the rules listed in the Generated by iptables-save ... line, they do all generally look correct.

I know you pasted the proxy container logs above, but you're sure that on the misbehaving Pods, the linkerd-proxy container definitely has a verified identity (i.e. Certified identity ... log line).

I also still don't know why the iptables would already exist on these Pods it does seem quite weird. Is there anything in the Pods describe output that shows them being restarted or something?

kleimkuhler avatar Dec 06 '22 18:12 kleimkuhler

Yes, it turns out that the init container is restarting. I've been able to correlate the following observations over several instances where we've seen this bug happen:

  • When the "skipping iptables setup" log is written, the unix timestamps in the comments of the rules dumped consistently show that they were created one or two seconds previously. For example, looking at the comment above, the log lines are at 2022-12-01T20:42:02Z, and the timestamp on the rules themselves is 1669927320, or 2022-12-01T20:42:00Z, two seconds prior.
  • We also see that the list of rules dumped is truncated compared to what should be present. Looking at the comment above, the dumped rules are missing the "install-proxy-init-output" rule, which is the last one that ConfigureFirewall() would configure. This corresponds with the typical failure mode that we see, which is that output connections are not proxied.
  • I can also see kubelet evidence of the init container being run twice in close succession, and audit evidence of the incomplete set of iptables execve() calls being made the first time around. Irritatingly we have a logging issue which means I unfortunately don't have the failed container output. I also don't see decisive evidence either of the container failing, or it being killed, so I'm not sure exactly what's happening.
  • Interestingly, all of the instances of this failure that we've seen happen at one or two seconds after the minute.

While the root cause isn't clear, I feel that there's a bug in the "skipping iptables setup" handling, because it assumes that the existence of a PROXY_INIT_OUTPUT or PROXY_INIT_REDIRECT implies that the rules are intact, and clearly here they are not. Kubernetes documentation mentions that init containers should be idempotent, and this functionality is not.

I'll open a PR with a suggested improvement.

jim-minter avatar Jan 06 '23 23:01 jim-minter