squid icon indicating copy to clipboard operation
squid copied to clipboard

Optimize ephemeral port reuse with IP_BIND_ADDRESS_NO_PORT

Open pponakan opened this issue 2 years ago • 16 comments

commBind: Cannot bind socket ... : (98) Address already in use

When opening a TCP connection from a specific IP address (e.g., set by tcp_outgoing_address), Squid usually lets the OS select the source port by performing the traditional bind(2)+connect(2) sequence with zero source port. This sequence consumes ephemeral ports at the connection opening rate even if many destinations are unique (and, hence, could reuse the same source port) because the OS must pick a source port at bind(2) time, before connect(2) supplies the destination address.

Starting with v4.2, Linux supports IP_BIND_ADDRESS_NO_PORT socket option designed to decrease ephemeral port consumption by delaying port binding until connect(2) time. If available, Squid now uses that option when opening any outgoing TCP connection bound to a source IP address.

pponakan avatar Aug 08 '22 23:08 pponakan

Can one of the admins verify this patch?

squid-prbot avatar Aug 08 '22 23:08 squid-prbot

Thank you very much for this contribution! While the current implementation cannot be accepted (for the reasons identified in specific change requests), it sounds like a useful feature that we should help you fix and accept. Would you be willing to address the blocking problems?

I am ignoring minor code and PR formatting imperfections for now. We can easily polish this after the bigger problems are solved.

Sure, glad to help out and please do point out any code paths that might break due to this fix. P.S. If the code diff looks good, I would like do some soak testing in our env before merging.

pponakan avatar Aug 09 '22 06:08 pponakan

P.S. If the code diff looks good, I would like do some soak testing in our env before merging.

You can retest at any time, of course. I will let you know when I think the code is ready for another round of tests. We are not there yet.

rousskov avatar Aug 09 '22 17:08 rousskov

Apologies if I still have any formatting issues or am not following some specific coding guidelines. I was not able to get the scripts/format-cpp.pl script to test my code diff for formatting. If there is anything else I should change, I would be glad to clean up. BTW, I wasnt sure whether I should use CamelCase or C-style naming for the private method that was added to comm.cc, as the file has a mix of both formats.

pponakan avatar Aug 09 '22 23:08 pponakan

Excellent progress, @pponakan, thank you! Please give me a day to do a polishing round. Do not worry about formatting and various gray areas (in this inconsistent official code) -- we will take care of that. The ball is on my side now.

rousskov avatar Aug 10 '22 03:08 rousskov

Thanks @rousskov! I will run a canary test with this diff in our env to verify it still works as the initial patch. A simple test would be to reduce the net.ipv4.ip_local_port_range to a very small (maybe 10 ports) range and attempt to have squid make 11 or more concurrent sessions to different dest addresses with a single tcp_outgoing_address configured.

pponakan avatar Aug 10 '22 06:08 pponakan

OK to test

rousskov avatar Aug 10 '22 23:08 rousskov

@pponakan, I am done with code changes for now. I tried to document most changes in commit messages. Please review and test. The changes took a somewhat unexpected turn after I cleaned up the code a bit. I hope I did not screw something up and you like the end result. Please adjust as needed, including reversal of anything and everything!

I will polish the PR title/description later -- out of time for now. Do not worry about the M-failed-description label until then.

rousskov avatar Aug 10 '22 23:08 rousskov

Thanks @rousskov, the diff looks fine IMO. I will use this to build a new image for our canary test. Will keep you posted on how the test is doing after letting it soak for about 24 hrs.

pponakan avatar Aug 11 '22 02:08 pponakan

Thanks. BTW when I pull this branch and attempt to run autoconf, I get the following error on an ubuntu system. I am trying to build squid from the latest commit with openssl 1.1.1n

configure.ac:13: error: possibly undefined macro: AM_INIT_AUTOMAKE If this token and others are legitimate, please use m4_pattern_allow. See the Autoconf documentation. configure.ac:16: error: possibly undefined macro: AM_MAINTAINER_MODE configure.ac:42: error: possibly undefined macro: AM_PROG_CC_C_O configure.ac:129: error: possibly undefined macro: AM_CONDITIONAL

pponakan avatar Aug 11 '22 03:08 pponakan

You may need to update or install some missing build dependencies. On Ubuntu that can be done with:

  apt-get update
  apt-get build-dep squid

(you may need to add the deb-src line for your Ubuntu version to APT sources.list)

Then re-bootstrap the toolchain:

  ./bootstrap.sh
  ./configure ......
  make

If that does not fix the build, what version of Ubuntu, automake and autoconf are you building with?

yadij avatar Aug 11 '22 04:08 yadij

Thanks @yadij, I had forgotten to run ./bootstrap.sh, that fixed the build. I am on ubuntu 20.04. However while running tests with a small local_port range I couldnt find any evidence of source port reuse on outbound TCP connections. I might have to recheck whether I did something wrong with the build and ended up not picking up the full diff (upto commit id 6cfbf233f3f2563a4cad627c229ca7fa8f6fbf62). I Will continue testing tomorrow.

pponakan avatar Aug 11 '22 07:08 pponakan

Update on what I have been able to test. I have confirmed my build steps are pulling in the specific commits into the build image.

I went back to the very first commit on this PR and was able to observe the intended behavior. For example with local_port range reduced to 32000-32010 the following TCP sessions appear in netstats, where A.B.C.D is the tcp_outgoing_address and X.X.X.X, Y.Y.Y.Y are 2 dest IPs being accessed via the squid proxy.

tcp 0 0 A.B.C.D:32008 X.X.X.X:80 ESTABLISHED tcp 0 0 A.B.C.D:32004 Y.Y.Y.Y:80 ESTABLISHED tcp 0 0 A.B.C.D:32010 X.X.X.X:80 ESTABLISHED tcp 0 0 A.B.C.D:32004 X.X.X.X:80 ESTABLISHED tcp 0 0 A.B.C.D:32007 X.X.X.X:80 ESTABLISHED tcp 0 0 A.B.C.D:32006 Y.Y.Y.Y:80 ESTABLISHED tcp 0 0 A.B.C.D:32002 X.X.X.X:80 ESTABLISHED tcp 0 0 A.B.C.D:32000 Y.Y.Y.Y:80 ESTABLISHED tcp 0 0 A.B.C.D:32003 X.X.X.X:80 ESTABLISHED tcp 0 0 A.B.C.D:32008 Y.Y.Y.Y:80 ESTABLISHED tcp 0 0 A.B.C.D:32006 X.X.X.X:80 ESTABLISHED tcp 0 0 A.B.C.D:32000 X.X.X.X:80 ESTABLISHED tcp 0 0 A.B.C.D:32002 Y.Y.Y.Y:80 ESTABLISHED tcp 0 0 A.B.C.D:32009 X.X.X.X:80 ESTABLISHED

I picked a few commits midway (my last commit 64d195b552d07e27bc81868032372fe5d46a43ed and 6a093ae6e67e323165e3c7c08cb21b280e09092f). Both seem to have issues as I only see 4 kid processes (we have 30 in the config) and source port reuse was not very evident, some errors from cache.log below. I will switch back to the latest commit and try again.

2022/08/11 17:01:14 kid8| ERROR: commBind Cannot bind socket FD 11 to [::]: (98) Address already in use 2022/08/11 17:01:14 kid5| ERROR: commBind Cannot bind socket FD 11 to [::]: (98) Address already in use 2022/08/11 17:01:14 kid5| ERROR: commBind Cannot bind socket FD 12 to 0.0.0.0: (98) Address already in use

pponakan avatar Aug 11 '22 17:08 pponakan

Can ignore the errors mentioned on my previous message. They donot appear if the local_port range is reduced after squid with 30 workers is brought up. Confirmed latest commit does not show reuse of outbound TCP source ports and a local health checking service on the same host as squid also runs out of ports that causes incorrect health status of squid to be reported (which was observed with the official 5.x versions that led us to pursue this patch).

pponakan avatar Aug 11 '22 18:08 pponakan

Confirmed latest commit does not show reuse of outbound TCP source ports

Please see my earlier review for that bug triage and suggested fixes.

rousskov avatar Aug 11 '22 20:08 rousskov

FYI, I just retested commit 6a093ae6e67e323165e3c7c08cb21b280e09092f by reducing the local_port range after squid is fully up. It shows source port reuse as well as the health checking service continues to function while all ports within the ephemeral port range are in use (with different 4-tuples).

pponakan avatar Aug 11 '22 20:08 pponakan

Confirmed latest commit does not show reuse of outbound TCP source ports

Please see my earlier review for that bug triage and suggested fixes.

@rousskov I have built squid with the fix suggested here and run a spot test with a small local_port range. The image shows clear evidence of source port reuse in our env. I will wait till this potential solution is fully reviewed before starting a canary test with it next week.

pponakan avatar Aug 13 '22 19:08 pponakan

Thanks for the last commit d96d055e52a5b91900540941a2d172c830105b23 to the PR. If we need to wait for further approvals, can you please let me know so that I can plan for a canary test with the latest diff.

BTW, will this fix be made available with squid version 5.x as well (if 6.0 is the intended release)? Just to have a fallback in case we run into problems with version 6.0.

pponakan avatar Aug 17 '22 01:08 pponakan

Thanks for the last commit d96d055 to the PR. If we need to wait for further approvals, can you please let me know so that I can plan for a canary test with the latest diff.

Please keep in mind that Squid PR reviews are not carefully coordinated by some central authority -- no one can predict exactly what will happen to an open PR. FWIW, my plan is to clear this PR for merging in a few days if you do not report test failures and there are no new change requests. Clearing a PR for merging does not guarantee anything either -- a last minute blocking review can stop the automated merge sequence...

  • If you would like to minimize the number of tests, then you should wait until the PR has no negative reviews and is either at least ten days old or has at least two positive reviews. The "PR approval" status check at the bottom of the PR GitHub page will turn green when this condition is reached.
  • If you want to test every (possibly temporary) cleared candidate, then please test now.

I would also recommend asking us not to merge the PR until your tests are over. We do not want to forget about this PR, but we are not in a rush to land it either and would be happy to wait for your green light. BTW, thank you for testing changes!

BTW, will this fix be made available with squid version 5.x as well?

I cannot speak for the v5 maintainer -- @yadij will make that call. Please do not misinterpret my answer as agreeing (or disagreeing) with you that the proposed change is a "fix" ;-).

rousskov avatar Aug 17 '22 02:08 rousskov

Thanks for explaining the merge process.

  • If you would like to minimize the number of tests, then you should wait until the PR has no negative reviews and is either at least ten days old or has at least two positive reviews. The "PR approval" status check at the bottom of the PR GitHub page will turn green when this condition is reached.

I can actually start off 2 canaries, one that receives a smaller amount traffic and later another canary that usually exhausts all ephemeral ports during peak traffic w/o this socket option applied. I'll start off the 1st canary tomorrow and observe it for a few days. Based on how that goes and whether new changes are required, I will then plan on starting the more extensive 2nd canary early next week.

I cannot speak for the v5 maintainer -- @yadij will make that call.

Thanks.

Please do not misinterpret my answer as agreeing (or disagreeing) with you that the proposed change is a "fix" ;-).

Hopefully this PR enhances the scaling aspect w/o breaking other existing functionality. :)

pponakan avatar Aug 17 '22 03:08 pponakan

Hopefully this PR enhances the scaling aspect w/o breaking other existing functionality. :)

Yes, that is my hope or understanding as well.

rousskov avatar Aug 17 '22 13:08 rousskov

Update on testing. Canary-1 has been running fine with the latest diff and shows a small number of source ports being reused even though the number of concurrent TCP sessions is only about 2-4% of the system's ip_local_port_range. Will continue with the plan of running the latest commit with canary-2 next week unless anything changes.

pponakan avatar Aug 19 '22 05:08 pponakan

The 2nd canary has been running fine with the same image over the last few days. The peak concurrent outbound session count on this canary went beyond 2x the ip_local_port_range w/o any problems binding to ephemeral ports.

pponakan avatar Aug 23 '22 00:08 pponakan