http Requests with body keep connection open for extra 5 seconds

Request:

> HTTP.post('https://example.com/any', :body => "some")

=> #<HTTP::Response/1.1 404 Not Found {"Server"=>"nginx/1.14.2",
"Date"=>"Wed, 30 Jan 2019 12:02:30 GMT",
"Content-Type"=>"text/html; charset=utf-8",
"Content-Length"=>"1564", "Connection"=>"close",
"X-Request-Id"=>"6976f707-699c-450b-8f99-fecce91cf9b3",
"X-Runtime"=>"0.007562"}>

Server log on Nginx:

192.168.241.13 - - [30/Jan/2019:12:02:35] https://example.com "POST /any HTTP/1.1"
status:404 req_time:5.028 up_conn:0.001 up_head_time:0.019 up_resp_time:0.019
up_status:404 up_addr:192.168.241.13:80 req_full_length:133 resp_full_length:1803
resp_body_length:1564 cache:- ref:"-" agent:"http.rb/1.0.2"

Notice that request was on 12:02:30, but Nginx server logged on 12:02:35 and request time was 5.028 seconds: 0.028 seconds of real work and 5.000 seconds a delay before connection was closed. Same issue with other request types like GET, but there should be any non empty body (:form, :body or :json).

I've seen 5 seconds keepalive timeout a default in code for persistent connections, but no persistent connections were explicitly requested, all options were default.

How to setup Nginx log:

    log_format combined_ext
        '$remote_addr - $remote_user [$time_local] '
        '$scheme://$host '
        '"$request" status:$status '
        'req_time:$request_time '
        'up_conn:$upstream_connect_time up_head_time:$upstream_header_time up_resp_time:$upstream_response_time up_status:$upstream_status up_addr:$upstream_addr '
        'req_full_length:$request_length resp_full_length:$bytes_sent resp_body_length:$body_bytes_sent cache:$upstream_cache_status '
        'ref:"$http_referer" agent:"$http_user_agent"';
    server {
        listen 80;
        server_name example.com;
        access_log /var/log/nginx/access.log combined_ext;

Jan 30 '19 12:01 Vanav

Response keeps connection until response body is consumed. There are couple of ways to achieve this:

# pretty much syntax sugar for `tap(&:to_s)`.
response = HTTP.post(...).flush

or:

response = HTTP.post(...)
body = response.to_s

or consume body in chunks to avoid loading it fully in memory:

response = HTTP.post(...)

File.open("/tmp/output", "wb") do |io|
  while (chunk = response.readpartial)
    io << chunk
  end
end

or closing connection explicitly when you're ok to discard response:

client = HTTP::Client.new

begin
  response = client.post(...)

  unless response.status.ok?
    warn "Unexpected HTTP response: #{response.status}"
    exit 1
  end
ensure
  # connection will be forcefully closed even if body was not consumed
  client.close
end

# notice, that if you closed connection prior consuming body - it will be lost:
puts response.to_s # => ""

There's also a plan on providing API to simplify last example, but work on that was not started yet.

Jan 30 '19 13:01 ixti

Thank you for describing this behavior.

I expect that high level methods will by default either consume the response body, or discard it. But don't keep connection open waiting for slow client code. This default behavior is really bad and unexpected in high load environment.

Low level methods can work in real time, with manual connect, read body and close connection.

At least it should be documented in https://github.com/httprb/http/wiki/Making-Requests and https://github.com/httprb/http/wiki/Passing-Parameters, that in simple use case:

response = HTTP.post("http://example.com/resource", :form => {:foo => "42"})
response.code

connection will be keep open for a long time.

Jan 30 '19 14:01 Vanav

I somewhat agree that documentation should be improved. But I disagree that default behaviour is bad.

If you don't want connection to be open, close it implicitly (by consuming body with any of the methods I've listed above) or explicitly (also shown above). It might be just my biased opinion as I get used to this API, but I don't find it THAT confusing. Trickiest part is that there's no one size fits all solution unfortunately.

I don't see this as a high-load related issue either. It's just default behaviour does not match your expectations. Nothing more. In your case you need response to be fully consumed in memory all the time. I had opposite experience where I don't needed response body at all - just headers. And I can't think of an API that will look intuitive in both cases without need of digging documentation anyway.

As I said, documentation can be improved. It's just me myself never read documentation - I dig through sources usually using documentation more to get quick map of what I might need.

This is an open source project, and all proposals are highly welcomed. If you think API/documentation is bad/confusing, please, open a discussion issue or pull request with proposed improvements. Just calling API bad is counterproductive IMO.

Jan 30 '19 15:01 ixti

For what it's worth, the API it sounds like @Vanav wants sounds an awful lot like the one this library had in its earliest releases:

body = HTTP.get("http://example.com")

That was it, you'd always get 100% of the body or an error. Easy peasy!

Adding streaming support, which necessitated adding a response object, made things slightly more complex:

body = HTTP.get("http://example.com").to_s

Unfortunately that's the cost of being able to support streaming, which was one of this library's design goals.

That pattern should do what @Vanav wants (as @ixti already mentioned) and is thoroughly documented:

https://github.com/httprb/http#basic-usage
https://github.com/httprb/http/wiki/Making-Requests
https://github.com/httprb/http/wiki/Response-Handling

Feb 04 '19 22:02 tarcieri

http http copied to clipboard

Requests with body keep connection open for extra 5 seconds

http
http copied to clipboard