hackney
hackney copied to clipboard
Unexpected `enomem` error during file downloading
Hi there!
We tried to migrate from 1.18.1 to 1.20.1 and encountered increased error rate during downloading huge (~1GB) files.
I've published an MRE: https://gist.github.com/viralpraxis/6209746ee3b108473452ee810678f769
You can run it via mix run -e 'Runner.start($URI)', $URI points to a file over 1GB (serving over python -m http.server would be fine).
On 1.18.1:
21:47:47.878 [info] 2: Streaming completed
21:47:47.946 [info] 1: Streaming completed
21:47:47.990 [info] 5: Streaming completed
21:47:48.055 [info] 4: Streaming completed
21:47:48.066 [info] 3: Streaming completed
On 1.20.1:
21:48:12.494 [error] Error while streaming: enomem
21:48:12.494 [error] Error while streaming: enomem
21:48:12.494 [error] Error while streaming: enomem
21:48:12.494 [error] Error while streaming: enomem
21:48:12.494 [error] Error while streaming: enomem
In fact this error is reproducible on 1.18.2. I bisect'd this error to https://github.com/benoitc/hackney/commit/5e74354a48653fe2456688f80c6bccb11143f6af (previous commit https://github.com/benoitc/hackney/commit/e3872f768a4f0b74c20a03c5e23ea9652d811f0e works fine).
elixir 1.16.1
erlang 26.2.2
Please let my know if you need any addition information. Thanks!
do you reproduce it in latest version ?
@benoitc just checked out 1.24.1 -- this is still valid.
I can confirm that downgrading to 1.18.1 fixes the enomem issue for me. In my case, it's about downloading 100 MB file from Azure Blob Storage (azurex -> httpoison -> hackney).
elixir 1.18.4 & erlang 27.3.4.1
This is interesting ...
- Before we bumped Hackney to 1.24.1, we had it pinned to eca5fbb1ff2d84facefb2a633e00f6ca16e7ddfd
- Then we bumped Hackney to 1.24.1, Elixir to 1.18.4 (was 1.16.2) and Erlang to 27.3.4.1 (was 26.2.2) and
enomemissue appeared - I did downgrade Hackney to 1.18.1 and the
enomemissue is gone - I bumped Hackney to eca5fbb1ff2d84facefb2a633e00f6ca16e7ddfd (basically 1.20.1 + some fixes) and the issue is back
... IOW, the pattern I see here is ...
enomemin Elixir 1.18.4 & Erlang 27.3.4.1- no
enomemin Elixir 1.16.2 & Erlang 26.2.2
I have also encountered this today. A couple of data points:
- The file size is 104MiB, so not huge
- we use
ExAWS.S3, which leverageshackneyas its default client - for some unknown reason, it happens on my Mac, but not on our production (Ubuntu Noble) docker container (yet at least)
For comparison, we are on:
- Erlang/OTP 27 [erts-15.2.7] [source] [64-bit] [smp:16:16] [ds:16:16:10] [async-threads:1] [jit]
- Elixir 1.18.4 (compiled with Erlang/OTP 27)
- hackney 1.24.1 if I'm correct
Alright, I took a closer look at suspicious commit https://github.com/benoitc/hackney/commit/5e74354a48653fe2456688f80c6bccb11143f6af. It seems that Hackney attempts to allocate N bytes of memory when downloading a file of size N. This means that downloading a 1GB file would try to allocate 1GB of memory, which is very likely to fail.
https://github.com/benoitc/hackney/blob/e2bbdf741ee374c872da2baadc7451b66644b421/src/hackney_response.erl#L369
I'm fairly pretty sure it should be removed and a more reasonable buffer size (e.g., 512KB, or possibly just default zero value) should be used.
I've opened https://github.com/benoitc/hackney/pull/774. I'm not familiar with the codebase, so I'm not sure if it's correct. But at least it resolves the enomem issue (tested on 2GB downloading) and I hope it might help move us toward a valid fix more quickly.
It seems that the amount of data is constrained by TCP C port driver. It forbids to ask for packets larger than 64MB:
https://github.com/erlang/otp/blob/359e254aba76c1986b671b45fd320c6cc6720ca8/erts/emulator/drivers/common/inet_drv.c#L1297 https://github.com/erlang/otp/blob/359e254aba76c1986b671b45fd320c6cc6720ca8/erts/emulator/drivers/common/inet_drv.c#L12177-L12178
Seeing
%HTTPoison.Error{reason: :enomem, id: nil}
when GETting a ~75MB file from Google Storage bucket.
elixir 1.18.4 erlang 28.0.1 hackney 1.24.1
Yeah, apparently this happens to any file >= 64MB. I believe https://github.com/benoitc/hackney/pull/774 should fix that.
For what is worth, there seems to be this other pull request (https://github.com/benoitc/hackney/pull/746) from a while ago also trying to fix some apparent buggy behavior coming from the same original "Body parsing optimization" (https://github.com/benoitc/hackney/pull/710) .
I wonder whether that also fixes this enomem issue or not...