luasocket icon indicating copy to clipboard operation
luasocket copied to clipboard

Changed return text for ETIMEDOUT/ WSAETIMEDOUT

Open hjelmeland opened this issue 10 years ago • 8 comments

Changed return text for ETIMEDOUT/ WSAETIMEDOUT to “connection timeout”.

This is needed for the application to be able tell to the difference between timeout of TCP connection (ETIMEDOUT/ WSAETIMEDOUT) and a normal return from a non-blocking socket (error codes EAGAIN/WSAEWOULDBLOCK). Both situations returned the text “timeout”.

hjelmeland avatar Sep 03 '15 13:09 hjelmeland

Can you clarify in which scenario this difference would be relevant?

diegonehab avatar Sep 08 '15 17:09 diegonehab

My scenario is a copas.lua based server that sends responses to requests. If the remote device is physically disconnected/powered off before the response is sent, then I want the socket:send() to eventually time out, so the error can be logged, connection closed and cleaned up.

After further testing I have found out that select() is behaving different on the device I am working on compared to current linux systems. On the linux 2.6 based device the problem is that after a ETIMEDOUT event on send(), select() will not return write-readyness for that socket. The result is that the connection is stuck forever in the copas.lua system.

On my PC with current linux, I found out that after a ETIMEDOUT event on send(), select() does return write-readyness for the socket, and when copas calls :send() again, error "closed" is returned, allowing the application to clean up the connection.

Even though this is less problem on recent linux, I still think timeout of the TCP connection itself is fundamentally different from a EAGAIN/WSAEWOULDBLOCK response from the socket API, and should give different responses to the application.

In order to preserve compatiblity with copas.lua I would change the return text on ETIMEDOUT/ WSAETIMEDOUT instead of on EAGAIN/WSAEWOULDBLOCK.

If you want to experiment with TCP timeouts on send on linux, you can do

sudo sysctl -w net.ipv4.tcp_retries2=3

to get timeouts in seconds instead of hours.

hjelmeland avatar Sep 09 '15 13:09 hjelmeland

Is this still desired and/or necessary?

ewestbrook avatar Mar 11 '19 06:03 ewestbrook

Can’t this be a 2.6 specific fix to solve just the hanging problem? Leaking a change out to every other system seems overkill.

diegonehab avatar Nov 10 '23 08:11 diegonehab

Returning a different error per platform for the same condition doesn't sound right. And the platform being used on end of the socket may not be the same platform as on the other end. If we do differentiate the error messages for the two different error cases I think it should probably be done universally with no platform or version gating logic.

alerque avatar Nov 10 '23 18:11 alerque

I meant solving select hang. Not returning a different error code in different platforms. I think this is the cleanest solution.

diegonehab avatar Nov 10 '23 18:11 diegonehab

Ah, I suppose that would make sense if it is possible.

I'm not in a position to test and root this out though so if somebody still has this issue and/or wants to jump in I'll help facilitate a contribution.

alerque avatar Nov 10 '23 18:11 alerque

Even if we needed special code inside an ifdef and an additional flag in the socket structure just to work around this bug, I think it would be preferable to adding a new return that could break a lot of code out there.

diegonehab avatar Nov 10 '23 19:11 diegonehab