hackney
hackney copied to clipboard
:connect_timeout errors on initial batch of requests
I've been following/commenting on some previous issues regarding :connect_timeout
errors reported here, and recent 0.11/12 releases have reduced these dramatically π
However, I'm still experiencing an interesting issue using Hackney 1.12.1
and HTTPoison 0.13.0
.
I make an initial batch of about ~80 requests to several hosts simultaneously. Of those, about half result in %HTTPoison.Error{id: nil, reason: :connect_timeout}
. The same batch repeats every 2 minutes, and the second time around, only ~10-15 fail. After that, subsequent batches produce not :connect_timeout
errors. With HTTPoison, I'm passing [timeout: 1000, recv_timeout: 5000]
as config.
This has been consistently happening for some time now β errors in first 2 batches, then all is good. I'm wondering why this is happening and is there some kind of a workaround that could be applied? Thanks!
Update
Just wanted to add that all requests are fired within a Task.start/1
function.
Other versions: Elixir 1.6
, OTP 20.0
.
Hello,
Same here (or as https://github.com/benoitc/hackney/issues/462). After a week on a production server, it hangs on {error,connect_timeout}
. The only way to fix it is restarting the node.
Plain Erlang 20.1 (erts-9.2) Hackney 1.12.1 (e6d84f40a4d2d650659ac54dba99c0bcbbb86af4).
I'll add a state of hackney pool here when it happens next time.
@svileng can you eventually provide a trace of your connections. It seems weird to return connect_timeout the first time. If the pool is empty or no host is already connected it should normally connect directly to it. Also how many distinct host do you have?
@desoulter ^^
@benoitc Thank you for looking into this!
I would guess the pool is empty or at least nearly empty β as soon as my application starts, a GenServer schedules the first batch of requests after 2 minutes and they are made on the same interval after that. It might be possible that a few individual requests are made before the batch kicks in. The errors consistently happen the first 2 times (second time around always produces less errors) when the application starts.
The number of distinct hosts is exactly 30 (not quite "several" as I said above π).
How can I get a trace, could you please point me to the right configuration option?
@svileng you can start to trace with hackney_trace:enable/2
:
hackney_trace:enable(max, io).
Put that line before starting all your requests and it should log to outpu. io
can be replaced by a filename as a string.
Hope it helps.
@benoitc I repeated the issue with {error,connect_timeout}
just after making about 20 requests.
Little information: my application have a background InfluxDB Reporter for exometer
which also uses hackney as http-client.
So, here is a pool state:
https://gist.github.com/desoulter/4c8fca893c965f2fb57c600936a7c133
And a trace:
https://gist.github.com/desoulter/841bc24d63ea3e225be8be53d65c2197
@benoitc Apologies for the delay. I enabled the trace locally on my dev machine, where only 1 request is being made, and there is quite a lot of stuff printed out, including some that I cannot share if I run this in production (for example: Authorization headers, IP addresses, body, the actual URL and others...).
What trace/debug information do you need specifically from the requests and how can I make hackney_trace
print out only what we need to debug this issue? Worst case, I can try remove sensitive information from the trace manually, before pasting it here, but that would be a pain...
Thanks!
Update: just ran it briefly in production for one batch (first one) of requests, before reverting the change. I got 9288 lines π΅Let me know what's useful so I can grab it out of the file. I noticed quite a lot of no socket in the pool
messages, often in batches like this one below:
[hackney trace 80 <0.736.0> 2018:05:16 15:49:43 496] no socket in the pool
Content: [{module,hackney_connect},{line,200},{pool,default}]
[hackney trace 80 <0.712.0> 2018:05:16 15:49:43 496] no socket in the pool
Content: [{module,hackney_connect},{line,200},{pool,default}]
[hackney trace 80 <0.733.0> 2018:05:16 15:49:43 496] no socket in the pool
Content: [{module,hackney_connect},{line,200},{pool,default}]
[hackney trace 80 <0.720.0> 2018:05:16 15:49:43 496] no socket in the pool
Content: [{module,hackney_connect},{line,200},{pool,default}]
[hackney trace 80 <0.751.0> 2018:05:16 15:49:43 496] no socket in the pool
Content: [{module,hackney_connect},{line,200},{pool,default}]
[hackney trace 80 <0.758.0> 2018:05:16 15:49:43 496] no socket in the pool
Content: [{module,hackney_connect},{line,200},{pool,default}]
[hackney trace 80 <0.714.0> 2018:05:16 15:49:43 496] no socket in the pool
Content: [{module,hackney_connect},{line,200},{pool,default}]
[hackney trace 80 <0.746.0> 2018:05:16 15:49:43 496] no socket in the pool
Content: [{module,hackney_connect},{line,200},{pool,default}]
[hackney trace 80 <0.730.0> 2018:05:16 15:49:43 496] no socket in the pool
Content: [{module,hackney_connect},{line,200},{pool,default}]
[hackney trace 80 <0.740.0> 2018:05:16 15:49:43 496] no socket in the pool
Content: [{module,hackney_connect},{line,200},{pool,default}]
[hackney trace 80 <0.735.0> 2018:05:16 15:49:43 496] no socket in the pool
Content: [{module,hackney_connect},{line,200},{pool,default}]
[hackney trace 80 <0.725.0> 2018:05:16 15:49:43 496] no socket in the pool
Content: [{module,hackney_connect},{line,200},{pool,default}]
[hackney trace 80 <0.721.0> 2018:05:16 15:49:43 496] no socket in the pool
Content: [{module,hackney_connect},{line,200},{pool,default}]
But I'm not sure if it is standard debug info or an indicator of some sort of a problem?
@svileng @desoulter thanks, looking at it. I will revisit that issue sometimes tomorrow and provide more feedback
@svileng this looks standard. if you see anything weird related to the timeout just let me know. anything related to the pool actually
@benoitc I see. I'll clean the log file I have and will post it today or tomorrow β you know best what to look for, I'll just clear up the sensitive info π
@benoitc Apologies for the delay, please find the log file here: hackney-log-cleaned.txt
I have removed the following information and replaced it with [FILTERED]
, in case you're wondering what's going on:
- Host info for requests;
- Auth header values and some response headers;
- Request json body
As I mentioned before, there are 30 distinct hosts, which are various Salesforce instances in different regions; the API calls however are all the same (you can notice /services/data/v39.0/composite/batch
all over the place).
Let me know if you have any questions or if there is anything I can do to help investigate the issue! Thanks!
@benoitc Have you had a chance to look at the logs from me and @desoulter above? Let us know if there's anything we could do to help investigate the issue. Thanks!
@svileng sorry i missed the initial notifications. I will have a look today. Sorry for the delay
This happens on 2 load balanced web servers once in a while and all http client requests to different hosts return :connect timeout on both web servers when it does. Enabled hackney trace and got the following. Restarting the process resolves this.
[hackney trace 80 <9427.17789.8> 2019:05:06 07:23:32 4836] request Content: [{module,hackney}, {line,293}, {method,post}, {url,{hackney_url,hackney_tcp,http,<<"[HOST]">>, <<"[PATH]">>,<<"[PATH]">>,<<>>, <<>>,"[HOST]",[PORT],<<>>,<<>>}}, {headers,[]}, {body,{form,[{<<"[KEY]">>,<<"[VAL]">>}, {<<"[KEY]">>,<<"[VAL]">>}, {<<"[KEY]">>,<<"[VAL]">>}]}}, {options,[]}] [hackney trace 60 <9427.17789.8> 2019:05:06 07:23:32 4836] request without proxy Content: [{module,hackney},{line,665}] [hackney trace 60 <9427.17789.8> 2019:05:06 07:23:32 4836] connect Content: [{module,hackney_connect}, {line,33}, {transport,hackney_tcp}, {host,"[HOST]"}, {port,[PORT]}, {dynamic,true}] {:error, [reason: :connect_timeout]}
Restarting the process resolves this.
@larryweya sounds like this is a slightly different issue than mine (but maybe related? @benoitc will know) since my :connect_timeout
errors actually disappear after a few retries.
For the record, I'm still experiencing this in hackney 1.15.1
.