httpclient
httpclient copied to clipboard
HttpClient::Session#read_body_chunked timeout logic unoptimal?
I've been trying to understand why there is 80...100+ "Module::Timeout>#timeout" threads appearing while there is only some 7 actual REST API requests using HttpClient. After some digging in, I've started to ponder if read_body_chunked logic triggers those threads. For me "::Timeout#timeout" logic looks a bit inefficient, since it seems to trigger new thread (and destroy it in the end) on every invocation.
RS = "\r\n"
def read_body_chunked(&block)
buf = empty_bin_str
while true
::Timeout.timeout(@receive_timeout, ReceiveTimeoutError) do
len = @socket.gets(RS)
if len.nil? # EOF
close
return
end
@chunk_length = len.hex
if @chunk_length == 0
@content_length = 0
@socket.gets(RS)
return
end
@socket.read(@chunk_length, buf)
@socket.read(2)
end
unless buf.empty?
yield buf
end
end
end
According to ruby-prof
1.76% (79.97%) HTTPClient#request [1 calls, 7 total]
1.74% (98.89%) HTTPClient#do_request [1 calls, 7 total]
1.73% (99.60%) HTTPClient#protect_keep_alive_disconnected [1 calls, 7 total]
1.73% (99.92%) HTTPClient#do_get_block [1 calls, 7 total]
1.64% (94.51%) HTTPClient::Session#get_body [1 calls, 7 total]
1.63% (99.95%) HTTPClient::Session#read_body_chunked [1 calls, 2 total]
1.60% (97.74%) <Module::Timeout>#timeout [59 calls, 85 total]
1.40% (87.74%) <Class::Thread>#start [59 calls, 85 total]
There is around 8 timeout threads started per rest api request.
Tried out experimental patch, which in my use case works fine (since "total time of whole request including transfer time" is good enough factor for in my use specific case; in general processing time dominates in this case, transfer time over network is not relevant)
class HTTPClient
class Session
def read_body_chunked(&block)
::Timeout.timeout(@receive_timeout, ReceiveTimeoutError) do
buf = empty_bin_str
while true
len = @socket.gets(RS)
if len.nil? # EOF
close
break
end
@chunk_length = len.hex
if @chunk_length == 0
@content_length = 0
@socket.gets(RS)
break
end
@socket.read(@chunk_length, buf)
@socket.read(2)
unless buf.empty?
yield buf
end
end
end
end
end
end
After this httpclient logic is not consuming enough time to appear in ruby-prof profiling results, and amount of Timeout#timeout threads is decreased significantly.
@kikonen Thanks for the report. It's interesting. Surely Timeout::timeout creates Thread for each timeout call in Ruby, and that's the reason why httpclient had custom Timeout: https://github.com/nahi/httpclient/blob/master/lib/httpclient/timeout.rb But after 1.9 the performance of Thread creation became faster so I stopped using it: https://github.com/nahi/httpclient/blob/master/lib/httpclient/timeout.rb#L26
What version of Ruby are you using? Possibly Timeout.timeout became slower these days.
Re: your proposed patch, it should improve performance by removing Thread.timeout calls. But it also changes the semantics of receive_timeout because current receive_timeout is for time of each read operation, not the accumulated time of read operations. If accumulated time works for your case, you can do the same thing manually by the following code.
client.receive_timeout = nil # 60 by default, nil means no timeout == no Timeout thread
Timeout.timeout(60) do
client.get(...)
end
In my case access is into internal private REST API, and if there's access problem then it most certainly occurs in initial connection. Also logic (in my case) works so that there's anyway separate timeout, which goes over whole operation (i.e. proxy servers between will timeout XHR call done from web browser side anyway after certain timeout period).