elks [net] SO_REUSEADDR status

[net] SO_REUSEADDR status

Open Mellvik opened this issue 3 years ago • 1 comments

@ghaerr,

I was looking for a logical place to put this, count find it, so I created a new issue.

While waiting for the SO_REUSEADDR fix, I figured it might be useful to report how far we've got with your other (related) fixes.

I did a series of tests with ftpd in active mode yesterday - no SO_REUSEADDR anywhere, here's what I got - pretty encouraging.

The logic in the active mode connection setup (ftpd is opening an outbound connection to the client), is to open a socket on local port 20. If the open fails and the error is EADDRINUSE, wait 1 sec and try again - up to 10 times.

The experience is as follows - transferring some 50 files back to back:

for transfers FROM ELKS, the message tcp: port 20 already bound, rejecting (use SO_REUSEADDR?) appears about 75% of the time, the next bind succeeds: There is never more than one retry.
For transfers TO ELKS, there are no messages at all, the port reuse works fine all the time, back-to-back really fast.
The difference between the two presumably being that for outgoing transfers, there may still be data in the pipeline between the application (ftpd) and the wire, which there isn't when the data is incoming.

I may change the sleep(1) call on outbound transfers to using the usleep() call you suggested the other day, if that's useful. Otherwise just wait for the socket option to be fixed and this being the test case for that. Your take?

--M

Nov 23 '21 15:11 Mellvik

I'm glad to hear things are working well, even though a one second sleep is required between file transfers. This would be a good test for the new receive window advertisement change in #1025, at some point also.

Otherwise just wait for the socket option to be fixed and this being the test case for that.

Agreed.

I may change the sleep(1) call on outbound transfers to using the usleep() call you suggested the other day, if that's useful.

Probably not, see below.

The difference between the two presumably being that for outgoing transfers, there may still be data in the pipeline between the application (ftpd) and the wire, which there isn't when the data is incoming.

To fully understand why the transfers work different inbound vs outbound, specifically why you get the port reuse message in one direction only, we need to go back to the TCP state diagram: In one direction, the socket is closed by ftpd first, (which causes the state to transition to a FIN_WAIT/CLOSING -> TIME_WAIT state), and other other direction a FIN is received first, which causes the state to go into CLOSE_WAIT/LAST_ACK -> no TIME_WAIT state).

In the latter case, there is no TCP control block structure, it was deallocated completely. Since all transfers are reusing local port 20, there is no CB to reuse. In the former case, the previous transfer socket in in TIME_WAIT for one second, so the port reuse fails until it times out of TIME_WAIT. It's that simple.

The SO_REUSEADDR should allow a port that is in the TIME_WAIT state to be reused, which it does properly in this case. However, I have found that certain OTHER cases are also allowed when they should not be (for instance an additional start of telnetd on port 23, that reuses a previous outbound telnet connection on port 23), permitted while an EXISTING telnetd server is already listening on 23! In other words, ktcp must search the entire list of control blocks, not just look at the last/first one found, to determine whether reuse should be allowed.

I hope this clarifies both why a sleep(1) is needed for now, and why an occasional port reuse reject message is generated (slightly longer sleep is needed - race condition at exactly 1 second between ktcp time_wait and your app). I have also tried to clarify that SO_REUSEADDR works for this scenario, but is known buggy for unseen other scenarios.

Thank you!

Nov 23 '21 15:11 ghaerr

elks elks copied to clipboard

[net] SO_REUSEADDR status

elks
elks copied to clipboard