hackney icon indicating copy to clipboard operation
hackney copied to clipboard

Responses with 'connection: close' are sometimes lost

Open g-andrade opened this issue 6 years ago • 6 comments

Hi,

There appears to be an edge case in the way clean connection shutdowns are handled; sometimes, due to nondeterministic delays and concurrency, hackney is attempting to synchronously receive (hackney_tcp:recv/2) either the status line or the response headers and it gets {error, closed} because the connection has, well, closed.

Now, the issue here is that this may happen even when we've gotten a valid HTTP response specifying that the connection was indeed about to close, but it's too late to read it. The end result is that the caller gets {error, closed} instead of the actual HTTP response.

The easiest way to reproduce this is to try executing a non-keepalive request with a large body, and possibly tweak the underlying connection's buffer to rather low values.

I'm attaching a rebar3 project that reproduces the issue using hackney 1.9.0, cowboy and a pool of processes executing requests in parallel with payloads on the 50 MiB range. To run it, one can execute make run, which will boot a rebar3 shell, and then execute hackney_error_closed:run(). In my machine I'm usually able to reproduce the issue after only a handful of requests.

hackney_error_closed.zip

It could be argued that cowboy / ranch might be closing the connections before the network stack has had a chance to flush pending data, but I've checked Wireshark and the full response was being sent.

g-andrade avatar Oct 05 '17 16:10 g-andrade

Using async didn't help; most of the time the connection gets closed while I'm waiting either for the status line or the response headers. I suspect this is due to the way active mode is toggled back and forth in hackney_stream; it provides windows of opportunity for the connection to close before dispatching all the received data.

g-andrade avatar Oct 05 '17 17:10 g-andrade

did you try with the with_body parameter?

benoitc avatar Oct 15 '17 19:10 benoitc

I did; the test case I attached to the opening comment replicates the issue using with_body. It's a tricky situation to solve - if one reacts to {error, closed} when calling gen_tcp:send / ssl:send before either A) setting the socket to active mode or B) doing so but nevertheless ignoring the process inbox, the response will be lost. I believe it's necessary to make the connection process aware of a 'half open' sort of state, wherein data can still be received but no longer sent - eventually the socket closure message will come in and this gives a chance to receive the response just before.

g-andrade avatar Oct 15 '17 20:10 g-andrade

@g-andrade sorry I missed the link. I will test. imo the right fix would be maintaining the socket in its own process . So it will continue to receive the sockets events and queue them locally (until some level).

benoitc avatar Oct 17 '17 10:10 benoitc

I seem to be having the same problem above.

1> hackney:get(<<"https://1.1.1.1/FILE.WAV">>, [], <<"">>, [insecure, with_body]).
{error,{closed,<<>>}}

But if I make a request directly using ssl I get the data back

1> {ok, Socket} = ssl:connect('1.1.1.1', 443, [{verify, verify_none}]).
{ok,{sslsocket,{gen_tcp,#Port<0.411582>,tls_connection,
                        undefined},
               [<0.92.15>,<0.57.15>]}}
2> ssl:send(Socket, <<"GET /FILE.WAV HTTP/1.1\r\nHost: 1.1.1.1\r\n\r\n">>).
ok
3> receive Msg -> rp(Msg) end.
{ssl,{sslsocket,{gen_tcp,#Port<0.411582>,tls_connection,
                         undefined},
                [<0.92.15>,<0.57.15>]},
     "HTTP/1.1 200 OK\r\nDate: Fri, 11 Jan 2019 10:17:35 GMT\r\nServer: Apache/2.2.15 (CentOS)\r\nX-Powered-By: PHP/5.3.3\r\nContent-Disposition: attachment; filename=\"FILE.WAV\"\r\nContent-Transfer-Encoding: binary\r\nContent-Length: 711\r\nConnection: close\r\nContent-Type: audio/x-wav\r\n\r\n"}
ok
4> receive Msg2 -> rp(Msg2) end.
{ssl,{sslsocket,{gen_tcp,#Port<0.411582>,tls_connection,
                         undefined},
                [<0.92.15>,<0.57.15>]},
     [82,73,70,70,190,2,0,0,87,65,86,69,102,109,116,32,20,0,0,0,…]}
ok
5> receive Msg3 -> rp(Msg3) end.
{ssl_closed,{sslsocket,{gen_tcp,#Port<0.411582>,
                                tls_connection,undefined},
                       [<0.92.15>,<0.57.15>]}}
ok

I’m using hackney 1.15.0, erlang 21.2.2

andrewtimberlake avatar Jan 11 '19 10:01 andrewtimberlake

I did try to reproduce the issue using the code from the first comment. Error actually reproduces quite easily. However, after some digging and packet sniffing I found that actual cause of such behaviour is in the server code. Server handler does not actually read the whole request body. This leads to a known issue with half-closed TCP connection state, when one party already closed the connection (sending FIN flag, in out case it is server side), and other party is still trying to deliver some data (client side in our case). Server has nothing to do except setting RST flag, leading to immediate connection termination and losing client' buffer. Here is an example packet capture screenshot (I don't think you really need the .cap file here):

Screenshot 2021-06-18 at 15 27 39

If you modify server to read the whole sent body before sending the response, you will see that no repro occurs. e.g.

-module(hackney_error_closed_handler).
-behaviour(cowboy_handler).

-export([init/2]).

init(Req1, State) ->
    Req2 = dump_body(Req1),
    Req3 = cowboy_req:reply(200, #{<<"connection">> => <<"close">>}, <<"foobar">>, Req1),
    {ok, Req3, State}.

dump_body(Req0) ->
  case cowboy_req:read_body(Req0) of
    {ok, Body, Req1} -> Req1;
    {more, Body, Req1} -> dump_body(Req1)
  end.

Probably, the best approach to fix this is to untie up- and downstreams of TCP connection by either selective receives in a separate process, or even using two different processes for rx and tx activity on the socket.

shizzard avatar Jun 18 '21 12:06 shizzard