Fix cohttp EOF read
This work has been done together by the following people: @savvadia @vect0r-vicall @picojulien @johnyob @raphael-proust
Motivation
We have discovered that there is a resource leak for particular cases. Here are two possible scenarios:
- If the
callbacknever resolves,EOFis never detected, leading to a resource leak. - If the
callbackis very slow and resource-intensive,EOFis not detected until thecallbackresolves. Consequently, client resources are wasted even after the connection has been closed by the peer.
Solution
A new wait_eof_or_closed function is introduced, which repeatedly peeks (using MSG_PEEK) into the data stream for an EOF, in which case it closes the connection. MSG_PEEK used for this does not disturb reading from the stream by the handle_client. If the connection is closed locally before this happens, the function stops waiting for EOF.
If provided to the wait_eof_or_closed, the sleep_fn argument will be used to dictate the periodicity of the checks for EOF from the peer, yielding control periodically. If sleep_fn is not provided, the server keeps its legacy behaviour.
PR format
The format of the PR is as follows:
- The first
[WIP]commit adds atest_leak.mltest, which showcases theEOFleak; the[WIP]tag highlights that this test is not something that we want to be added necessarily, but we would be happy if you find it desirable or helpful and choose to incorporate it - The second commit fixes the leak showcased in the first commit
Testing
You can observe the initial leaking behaviour by cherry picking only the first commit:
$ g checkout v5-backports
$ g cherry-pick <first_commit>
$ make
$ dune exec ./_build/default/cohttp-lwt-unix/test/test_leak.exe
Server running on port 8080
Now, the server is listening, so you can call on a separate console curl -s 'localhost:8080/sleep' and observe that the server keeps printing messages of I slept. Now, when you close the curl request with CTRL+C, the server does not stop.
To solve this problem, add the second commit as well, and re-do this test, and after stopping the curl request, you will see this:
$ dune exec ./_build/default/cohttp-lwt-unix/test/test_leak.exe
Server running on port 8080
Cohttp connection on 1
I slept
I slept
I slept
I slept
Cohttp connection closed
So, the leak is now fixed.
One can also monitor the resources used by the executable thanks to a tool such as lsof:
$ lsof -c test_leak.exe | grep localhost
Ping @rgrinberg and @mefyl who might be able to review.
Thank you for the detailed report and reproduction case!
TLDR: this is IMO more a design choice than a bug. But it's a good design question.
There is a preliminary question to all this: is it a good idea to kill the server request handler if the client disconnects? I'm not really sure it is, as one could have some heavy processing server side after sending the response headers, and abruptly killing this processing if the client disconnects before the end of the body is a very opinionated choice. In the test example, the "resource leak" mentioned exists because the handler never ever returns, so the handler is a user-created resource leak in itself.
Now I do reckon that your point has merit, and stopping processing of a request when the client disconnects is a desirable feature in some if not most cases - it's just that failing to do so is a sub-optimal while doing so inappropriately is a straight bug. With that in mind I think it would be better to provide the expert request handler with the possibility to monitor client disconnection, and maybe provide some simpler default behavior in the simple interface. Happy to discuss this.
Thank you for the detailed report and reproduction case!
TLDR: this is IMO more a design choice than a bug. But it's a good design question.
There is a preliminary question to all this: is it a good idea to kill the server request handler if the client disconnects? I'm not really sure it is, as one could have some heavy processing server side after sending the response headers, and abruptly killing this processing if the client disconnects before the end of the body is a very opinionated choice. In the test example, the "resource leak" mentioned exists because the handler never ever returns, so the handler is a user-created resource leak in itself.
Now I do reckon that your point has merit, and stopping processing of a request when the client disconnects is a desirable feature in some if not most cases - it's just that failing to do so is a sub-optimal while doing so inappropriately is a straight bug. With that in mind I think it would be better to provide the expert request handler with the possibility to monitor client disconnection, and maybe provide some simpler default behavior in the simple interface. Happy to discuss this.
Thank you for your response!
We agree that there is motivation behind not defaulting to closing the connection automatically when the client disconnects and EOF is received. We therefore propose to have one more optional callback to configure this behaviour.
In cohttp-lwt/src/server.ml we proposed to add sleep_fn optional callback to check for EOF periodically.
We can add another one, callback_on_eof which will be triggered if EOF is detected. The default behaviour (if sleep_fn is provided and thus cohttp is configured to look for EOF) is to cancel the promise.
So the whole solution will be the following:
- if the client wants to wait for
EOFwhile handling the request,sleep_fnis provided- if the client wants to interrupt request handling on
EOF,callback_on_eofis not provided andcohttpusesLwt.pickwaiting forEOF - if the client provides
callback_on_eofthencohttpwaits forEOFinLwt.choosewhich will not interrupt the promise execution. IfEOFis received, the callback is called and then it's up to the client what to do with the main handler
- if the client wants to interrupt request handling on
The new callback should accept conn in the same way as callback to identify the connection:
type t = {
callback : conn -> Cohttp.Request.t -> Body.t -> response_action Lwt.t;
conn_closed : conn -> unit;
sleep_fn : (unit -> unit Lwt.t) option;
callback_on_eof : conn -> unit;
}
Please let us know what you think, we would be happy to chat about it. We can also showcase this adaptation in a fixup commit, if that is clearer for you. Would you prefer to continue this discussion in mails/comments here, or would it be easier for you to have a call?
@art-w Do you know how we can move forward on this?
@gabrielmoise17 Hey Gabriel! Just wondering if you found a workaround to avoid getting the EOF exceptions?