ingress-nginx Fix corrupted hostname from partial connection read

What this PR does / why we need it:

We have run into intermittent "HTTP request sent to an HTTPS server" errors from our SSL passthrough ingresses. We found that the passthrough proxy would sometimes read incomplete data from the connection and cause corrupted hostname.

When this happens, the connection handler would fall back to the default server at 127.0.0.1:442 and cause the error if the nginx.ingress.kubernetes.io/backend-protocol: HTTPS annotation is not specified.

This PR fixes the issue by making sure that we fully read the Client Hello data from the connection by getting the total length from the TLS header.

Types of changes

[x] Bug fix (non-breaking change which fixes an issue)
[ ] New feature (non-breaking change which adds functionality)
[ ] CVE Report (Scanner found CVE and adding report)
[ ] Breaking change (fix or feature that would cause existing functionality to change)
[ ] Documentation only

Which issue/s this PR fixes

How Has This Been Tested?

We built a custom image to debug the issue and added some debug logs. For example, we updated https://github.com/kubernetes/ingress-nginx/blob/4bac1200bfe4c95f654ffb86bea18ce9fc1c8630/pkg/tcpproxy/tcp.go#L74

to

klog.V(4).InfoS("TLS Client Hello", "host", hostname, "conn", fmt.Sprintf("%p", conn), "read-length", length, "total-length", int(data[3])<<8+int(data[4]))

Example log of corrupted data before the fix:

I0928 01:42:51.621260       7 tcp.go:74] "TLS Client Hello" host="\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00" conn="0xc002db0500" read-length=244 total-length=330

We can reproduce the issue by running curl in a loop. I've seen around 5% of errors from my machine with the number of requests as low as 10 enough to trigger the error. This fix eliminates the errors completely for us.

Checklist:

[ ] My change requires a change to the documentation.
[ ] I have updated the documentation accordingly.
[x] I've read the CONTRIBUTION guide
[ ] I have added unit and/or e2e tests to cover my changes.
[ ] All new and existing tests passed.

Sep 28 '23 03:09 chotiwat

Deploy Preview for kubernetes-ingress-nginx canceled.

Name	Link
Latest commit	608f37f7ae09b2a8136913e3a0d3113d17cc5a2b
Latest deploy log	https://app.netlify.com/sites/kubernetes-ingress-nginx/deploys/65cff1b329a47f00089cf6ed

Sep 28 '23 03:09 netlify[bot]

This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sep 28 '23 03:09 k8s-ci-robot

Hi @chotiwat. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sep 28 '23 03:09 k8s-ci-robot

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: chotiwat Once this PR has been reviewed and has the lgtm label, please assign cpanato for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

Sep 28 '23 03:09 k8s-ci-robot

/retest

Sep 29 '23 19:09 chotiwat

@chotiwat: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sep 29 '23 19:09 k8s-ci-robot

Hi @cpanato @tao12345666333, is there anything I need to do on my end to get this PR reviewed?

Oct 05 '23 01:10 chotiwat

Hi @rikatz, would you be able to take a look at this PR when you get a chance?

Oct 24 '23 20:10 chotiwat

Hi @rikatz, ping on this issue 🙏

Feb 05 '24 20:02 chotiwat

/retest

Feb 09 '24 02:02 strongjz

Which issue number shows the corrupted hostname
Can the corrupted hostname problem be reproduced
Why does the problem NOT occur for multiple users of ssl-passthrough
What are details of why a backend-protocol is in play when ssl traffic is directly passed through to backend
Is there any tests written for this change
Is any e2e test suite going to be added or modified to check this change's impact

Feb 09 '24 02:02 longwuyuan

Hi @longwuyuan, I've answered your questions below:

Which issue number shows the corrupted hostname

None. I couldn't find any issue describing the same problem.

Can the corrupted hostname problem be reproduced

Yes, as described in the PR body.

Why does the problem NOT occur for multiple users of ssl-passthrough

I wonder that myself. I can hazard a few guesses:

SSL passthrough might not be a widely used feature
The errors might have been disregarded by others if they aren't violating their SLAs
Retry mechanisms might have masked the errors
People might not care enough to report the issue

This doesn't mean the problem doesn't occur for other users of SSL passthrough though.

What are details of why a backend-protocol is in play when ssl traffic is directly passed through to backend

The connection handler for the proxy falls back to the default NGINX backend (p.Default) when the Client Hello hostname is corrupted, because it cannot be found in p.ServerList.

When this happens, the default backend will terminate the TLS and send a new request to the upstream per its generated NGINX configuration block, hence the HTTP-sent-to-HTTPS error.

Setting backend-protocol to HTTPS is just a hack to at least avoid the error, but it's not a real passthrough since the TLS gets terminated then re-initiated.

https://github.com/kubernetes/ingress-nginx/blob/86f3af8dead82f2d0905dceddeba213751e15b50/pkg/tcpproxy/tcp.go#L71-L76

https://github.com/kubernetes/ingress-nginx/blob/86f3af8dead82f2d0905dceddeba213751e15b50/pkg/tcpproxy/tcp.go#L43-L56

Is there any tests written for this change

No, there isn't currently. I could add some though (see below).

Is any e2e test suite going to be added or modified to check this change's impact

I could modify the existing test suite if that sounds good to you. I didn't see it initially. It took some digging around the docs.

Feb 12 '24 20:02 chotiwat

Can you create a issue for this and link it here. If you do, then ensuring that the issue description helps a developer of the ingress-nginx controller to reduce the work on reproducing the issue, will go a long way. I am still unclear on how I would reproduce the issue, if i wanted to
The hope against hope is that there is at least a few other users who face the same issue because the fact is that there is a large number of users of ssl-passthrough feature and why nobody else has reported this until now is extremely odd
I am not a developer so I am lost at the "connection handler" info you provided. So waiting for a developer to comment on that. My limit is failing to understand why, a new backend HTTPS connection, from the controller to backend pod, is even being stated as occuring, when the ssl-passthrough annotation implies that the termination of thc onnection from client, is not on the controller but instead is directly passed-through-and-through to the backend pod

Feb 12 '24 22:02 longwuyuan

And the e2e tests are a absolute requirement I think, based on the latest information you provided
thanks for the contribution

Feb 12 '24 22:02 longwuyuan

Ok, I'll go ahead and try to add an e2e test for this.

Can you create a issue for this and link it here. If you do, then ensuring that the issue description helps a developer of the ingress-nginx controller to reduce the work on reproducing the issue, will go a long way. I am still unclear on how I would reproduce the issue, if i wanted to

I believe I've explained as clearly as I could in this PR. If you insist, I can create an issue, which would be a copy of this PR body and my previous answers to your questions, but I personally don't see any point in doing so.

The mistake in the code should also be pretty clear from the developer's point of view as well. Hopefully, I'd be able to come up with an e2e test that would fail against the current main branch but would pass on this branch. If that's still not enough for the developers, then I can try to set up a minimum reproduction steps.

I am not a developer so I am lost at the "connection handler" info you provided. So waiting for a developer to comment on that.

Yes, let's do that.

Feb 12 '24 23:02 chotiwat

@chotiwat thanks for the contribution. I am copy/pasting my comments on the other PR here as well

I meant for the change and the tests to be in the same PR. Sorry i was not clear on that.
Can you please squash your commits
I am not a developer but I am extremely skeptical of the approach and assumptions here. If I can't reproduce the half-read connection using a stress tool like "hey" or "httperf" or "locust" or other such, then my opinion is that the changes you propose should not be forced on all the other users (simply because nonbody else is reporting this and we can not reproduce at will, so why fix things that are not broken)
Maybe your use case and your app have relative uniqueness for the incomplete read situation. Maybe your infra and networking contribute to the problem. If not then I should be able to reproduce at will with kin/minikube on any infra anywhere.
Your comment related to I/O makes me think that some resource starvation could have contributed to the incomplete-read hence your changes should not go to all the users (simply because bad idea to fix what is not broken for all other users)

Feb 17 '24 00:02 longwuyuan

@longwuyuan

Can you please squash your commits

Of course. I can do that once the PR has been reviewed and I address all the comments.

I am not a developer but I am extremely skeptical of the approach and assumptions here. If I can't reproduce the half-read connection using a stress tool like "hey" or "httperf" or "locust" or other such, then my opinion is that the changes you propose should not be forced on all the other users (simply because nonbody else is reporting this and we can not reproduce at will, so why fix things that are not broken)

Maybe your use case and your app have relative uniqueness for the incomplete read situation. Maybe your infra and networking contribute to the problem. If not then I should be able to reproduce at will with kin/minikube on any infra anywhere.

Your comment related to I/O makes me think that some resource starvation could have contributed to the incomplete-read hence your changes should not go to all the users (simply because bad idea to fix what is not broken for all other users)

Our use case is not unique at all. In fact, I believe it's one of the main reasons this SSL passthrough feature exists.

Tools like kind and minikube are great for testing Kubernetes functionalities but by no means can they represent production clusters with traffic going though multiple network interfaces. Yes, in a perfect world, I should be able to reproduce the issue on a kind cluster so everyone can try to reproduce. I tried but I couldn't. I cannot grant you access to our cluster that we can consistently reproduce the problem on either, and it would be overkill for me to try to set up an EKS cluster just to convince you, especially when

the problem is evident just by reading the code
I've provided a definitive proof, reproducing the problem, via the added e2e tests that pass with this fix but fail without

Here are a few resources that might be helpful for more productive conversations in the future:

https://man7.org/linux/man-pages/man2/read.2.html#RETURN_VALUE
- In linux/unix systems, TCP connections are read with the read() system call. The call isn't guaranteed to return all of the requested bytes.
  
  It is not an error if this number is smaller than the number of bytes requested; this may happen for example because fewer bytes are actually available right now
https://okanexe.medium.com/the-complete-guide-to-tcp-ip-connections-in-golang-1216dae27b5a#:~:text=Handling%20TCP%20Packets,sender%20and%20receiver.
https://www.ibm.com/docs/en/ztpf/1.1.0.15?topic=sessions-ssl-record-format

Feb 17 '24 02:02 chotiwat

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Mar 08 '24 00:03 k8s-ci-robot

ingress-nginx ingress-nginx copied to clipboard

Fix corrupted hostname from partial connection read

What this PR does / why we need it:

Types of changes

Which issue/s this PR fixes

How Has This Been Tested?

Checklist:

✅ Deploy Preview for kubernetes-ingress-nginx canceled.

ingress-nginx
ingress-nginx copied to clipboard

Deploy Preview for kubernetes-ingress-nginx canceled.