req
req copied to clipboard
Mint.TransportError - socket closed
(Mint.TransportError) socket closed\\n (req 0.4.8) lib/req.ex:1029: Req.request!/
It's quite hard to find a repro for this bug it's frequent enough that it shows up regularly on my Oban jobs and even when running code in iex -S mix phx.server. To fix I just need to re-run it, or let the Oban job retry.
Any ideas why this would happen so frequently? Again, I appreciate it's hard to debug without more repro steps but happy to figure out how to find that repro case!
Urls where this happens:
OpenAI's API, HowLongtoBeat's API, Target's RedSky API.
I can quite easily replicate this issue. Just reload the module that makes a Req request in an iex session. For me, after the code reload, I get socket closed.
For anyone running into this issue, please see https://github.com/sneako/finch/pull/273#issuecomment-2144879631, you can test it by setting Finch dependency to:
{:finch, github: "keathley/finch", branch: "handle-idle-connections", override: true}
This is still happening very frequently. I'm calling Req.post from Oban jobs and it happens quite a lot.
Req.post(@base_url,
headers: headers,
json: body
)
"** (Req.TransportError) socket closed
(req 0.5.6) lib/req.ex:1092: Req.request!/2
I have the same issue, I have a lot of logs with this error message:
** (Mint.TransportError) socket closed
Req.retry/3: Got exception. Will retry in 0ms, 1 attempt left
does that mean that the request failed and it is going to retry it? or that it failed completely?
It means the the request will be retried.
We're observing an abnormally high incidence of Req.TransportError (socket closed) in our Oban jobs sending requests to APIs.
It is expected that sometimes this might happen, but the incidence really seems abnormal to what we'd expect, and it doesn't seem correlated to a single service but to all different APIs we request often. At a point we checked if there would be any configuration issue on our provisioning that could be related.
Is there any guideline, workaround or known issues related to this?
Hey Amadeus, long time no see! Nothing comes to mind however if you are able to consistently reproduce this issue it’d be really appreciated if you could debug this by removing layers, if you can reproduce the same error using Finch directly then that’s a sign the problem is not in Req. And then if you can reproduce the problem with using Mint directly, maybe it’s a matter of socket configuration. Worth using different http clients and seeing if you can reproduce this.
One Oban specific piece of advice, Req does retries by default for GET requests, Oban jobs tend to do retries too so you might be retrying more than expected.
Hey Amadeus, long time no see!
Hey Wojtek, glad to see you too! I was wondering if you'd remember, I'm very happy to see you so active in this community I've joined a couple years ago, keep it up!
One Oban specific piece of advice, Req does retries by default for GET requests, Oban jobs tend to do retries too so you might be retrying more than expected.
Yes, we're aware. Our critical non-idempotent jobs are not allowed to be retried unless carefully inspected to be safe to do so, what exacerbates this issue for us, since we have to track every instance of this error occurring or to allow retries for this case. Keeping in mind our volume is still very low compared to our goal, to already be observing such high incidence, so this issue is very relevant for us.
For non critical jobs (still not idempotent, we don't allow retries) we're considering introducing this option (keep get and head as they are, allow retry on any :econnrefused | :closed) to test out if the issue is mitigated:
retry: fn
%Req.Request{method: method}, response_or_exception
when method in [:get, :head] ->
# Vendored from Req.Steps
Utilities.Req.transient?(response_or_exception)
_request, %Req.TransportError{reason: reason}
when reason in [:closed, :econnrefused] ->
true
_request, _response_or_error ->
false
end
if you could debug this by removing layers, if you can reproduce the same error using Finch directly then that’s a sign the problem is not in Req
Any suggestion on how we can instrument this so we can provide useful data?
We're observing an abnormally high incidence of Req.TransportError (socket closed) in our Oban jobs sending requests to APIs.
It is expected that sometimes this might happen, but the incidence really seems abnormal to what we'd expect, and it doesn't seem correlated to a single service but to all different APIs we request often. At a point we checked if there would be any configuration issue on our provisioning that could be related.
Is there any guideline, workaround or known issues related to this?
This has been my experience as well - never figured out why it happens still
This is anecdata, but this was a big issue for us, and it seemed to happen only for connections to particular upstream servers (most notably Anthropic API). We couldn't find a root cause in reasonable time, so we switched to HTTPoison for these problematic connections.
We couldn't find a root cause in reasonable time, so we switched to HTTPoison for these problematic connections.
Would you mind sharing whether switching to httpoison decreased the perceived (or hard data if you got) amount of timeouts?
We couldn't find a root cause in reasonable time, so we switched to HTTPoison for these problematic connections.
Would you mind sharing whether switching to httpoison decreased the perceived (or hard data if you got) amount of timeouts?
Yes, we switched around April 2024 and the issue disappeared.
FWIW we're behind AWS NAT, so the workaround in https://github.com/sneako/finch/issues/272#issuecomment-2145675560 may be a solution for us, but IIRC we also saw the same issue with HTTP/2 pools that don't have pool_max_idle_time