gunicorn
gunicorn copied to clipboard
shutdown() socket before close()
This should prevent problems where writing a bunch of data to the socket before closing it causes data to be lost, because the data was not actually written before the socket and the TCP connection go away.
Fixes #840
For more information, see https://blog.netherlabs.nl/articles/2009/01/18/the-ultimate-so_linger-page-or-why-is-my-tcp-not-reliable
We already do this : https://github.com/benoitc/gunicorn/blob/master/gunicorn/workers/sync.py#L199
same in async worker. Which worker are you using it? I would not put it there anyway as we don't always want it. Only pass the option where it's needed.
Thanks!
I've very recently tracked down something similar in gevent. Some platforms are extremely sensitive to this issue, others much less so. Adding OpenSSL on top of it makes things even worse.
If there's a chance that sock could be an ssl.SSLSocket, it's important to call sock.unwrap() before calling sock.shutdown() because there's a layer of buffering there too that needs to be shutdown; this is especially true on Windows with Python 3.7+ for whatever reason:
def close(sock):
if hasattr(sock, 'unwrap'):
try:
sock = sock.unwrap()
except socket.error as e:
if e.args[0] != 0:
# Some platforms sometimes produce an exception with errno = 0 here
# Or maybe that's just a gevent thing.
raise
sock.shutdown(socket.SHUT_RDWR)
We already do this :
That's only in the event of an exception, though. In the regular handle() loop, once we've exhasted all the requests, we seem to just call util.close(sock). https://github.com/benoitc/gunicorn/blob/4bed9e7b192ae211b7d1f6c14065c373c9e1c0aa/gunicorn/workers/sync.py#L159
And there are places where we don't even do that, we just directly close it. https://github.com/benoitc/gunicorn/blob/4bed9e7b192ae211b7d1f6c14065c373c9e1c0aa/gunicorn/workers/sync.py#L141-L144
yes because in theory you don't want to shutdown yourself the socket since we read it until the end... We used to do that (~2010) but for some reason it has been removed.
I need to find the related commit to refresh myself about that.
@jamadden the snippet on SSL is normal the socket is already close there, you only want to destroy the fd (at least in theory;)
More generally speaking I Would prefer to handle the shutdown explicitly in the place we are sure it can be needed. I would accept a patch with that.
note: it's odd anyway the error only happen on Heroku. I would be interrested to know about the internals there that could trigger it.
One reason not to call shutdown (speaking from memory, I may have this completely wrong): By actively sending the FIN packet, it starts timers in intermediate network stacks and can result in a slow client not actually being able to read all the data that's theoretically been sent.
it's odd anyway the error only happen on Heroku. I would be interrested to know about the internals there that could trigger it.
Yeah, me too. When I was dealing with this (just today!) I could reproduce it 100% of the time in a Fedora 33 RawHide docker container, but not at all natively on macOS or in the manylinux2010 docker container; I've seen it intermittently on Ubuntu Xenial on Travis and possibly even on AppVeyor/Windows 10. It seems this has to do with quite low-level kernel networking parameters (buffer sizes?) — but that's on localhost, which behaves very differently than addresses that actually go across the network.
ok I found the place this shutdown has been removed (long time ago) : https://github.com/benoitc/gunicorn/commit/615c6a692703d3f6376bcffca2c0cff147196a0c#diff-0b90f794c3e9742c45bf484505e3db8d
The commit log doesn't reflect the whole change unfortunately. But now if I remember it correctly shutdown is blocking and can take some time which may have been an issue t that time.
Anyway like I said, let be conservative there. @erjiang can you possibly patch gunicorn by adding this shutdown to the place it's needed, instead of this close function? It would be usefull. (also I would add the change from @jamadden too).
The commit log doesn't reflect the whole change unfortunately. But now if I remember it correctly shutdown is blocking and can take some time which may have been an issue t that time.
It makes sense that shutdown is blocking in order to wait for rest of data to be written. From my understanding, without shutdown(), there's no guarantee that the kernel will not simply close the socket and drop any remaining data. It may work 99% of the time, or in "most" configurations, but without doing it, we are saying, "close the socket and don't worry about sending out the rest of the data".
Anyways, I will investigate a bit more to see if there's a better place to call shutdown(), but it's not clear to me in which situations we would not want it. Maybe only error cases don't need shutdown()?
Anyways, I will investigate a bit more to see if there's a better place to call shutdown(), but it's not clear to me in which situations we would not want it. Maybe only error cases don't need shutdown()?
Yes it normally should be enough. I will investigate it tomorrow also. (it's late there)
One potential trigger of the issue is not fully reading the socket before the socket is closed. That would explain why it's so easy to trigger the issue in the sample code I posted - handling a POST in Flask without reading the POST body seems to trigger this problem. The relevant RFCs state that if there's data remaining in the recv buffer when close() is called, then RST should be sent instead of FIN.
Anyways, it seems like shutdown() is always needed. The only scenario that I can think of to not shutdown() is if there are multiple users of the socket, which I think you acknowledge in your old commit as "(trick from twisted)". But it doesn't look like the current code does that in sync.py, base_async.py, or gthread.py.
I updated my PR to include the sock.unwrap call, but left the code in util.py.
Hi! What are the next steps for this PR? :-)
There are some places where we call shutdown and close in the worker code. Should we change these to call this utility?
Never mind. I think we already do. I see a cleanup, potentially, but I'll investigate that after the merge.
Never mind. I think we already do. I see a cleanup, potentially, but I'll investigate that after the merge.
care to elaborate? In it's current state I wouldn't approve this change since it's not explicitely shutdown the socket where it needs to. We don't want to block a worker. Taking my notes from that old time, the reasoning was that the application should take care about making sure the response is fully read before releasing the request. (or we finish the iteration).
The PEP 3333 says:
When writes are done during the execution of an application object, the application can ensure that resources are released using a try/finally block. But, if the application returns an iterable, any resources used will not be released until the iterable is garbage collected. The close() idiom allows an application to release critical resources at the end of a request [..]
In any ccase make the closing asynchronous.
Revisiting this old PR. It looks like we currently do shutdown only when an exception gets raised. I'm guessing that's to ensure that headers get sent.
What's the expectation we want for applications? Do they need to somehow flush their own writes or should we be calling shutdown for them on a clean request exit?
no activity since awhile. closing feel free to create a new ticket if needed.