pixi
pixi copied to clipboard
Add retries for package installations
Problem description
In fragile networks with higher package loss, we often run into issues like the following:
× failed to fetch msgpack-python-1.0.8-py311h52f7536_0.conda
├─▶ error sending request for url (https://my.conda.mirror.com/
│ artifactory/conda-forge/linux-64/msgpack-python-1.0.8-
│ py311h52f7536_0.conda)
├─▶ client error (SendRequest)
├─▶ connection error
╰─▶ bytes remaining on stream
or
× failed to fetch custom.package-1.0.0-1-.conda
├─▶ error sending request for url (https://my.conda.mirror.com/
│ artifactory/conda-custom-channel/noarch/custom.package-1.0.0-1-
│ .conda)
├─▶ client error (SendRequest)
├─▶ connection error
╰─▶ peer closed connection without sending TLS close_notify: https://
docs.rs/rustls/latest/rustls/manual/_03_howto/index.html#unexpected-eof
This is most likely due to networking issues on the first download. It would be nice if pixi tried multiple times and only fail after the 5th time or so. Maybe also configurable in the global config?
To be sure, this is upon pixi install or something along those lines?
In general, we used to have a retry client. It's possible that it was lost during refactor.
The retry client does only retry on certain network errors (e.g. 50x I think). It could be that certain network errors aren't retried because it assumes something bigger is faulty :D
Yes, this is happening during pixi install --locked
Is there any way to see retroactively if pixi retried or not? Maybe it might makes sense if pixi wrote a warning to stderr 🤔
This is the one we're using: https://docs.rs/reqwest-retry/latest/reqwest_retry/
I do think we can customize the function in order to add logging.
From what I can see, pixi install should do retries on some error types.
What's interesting is that in all my failures, I haven't seen WARN failed to download and extract ... in my logs (which should be there in the default log level)
https://github.com/conda/rattler/blob/4885895f8af18321a50b8c376c5d42231a3d3743/crates/rattler_cache/src/package_cache/mod.rs#L241
So it seems to me that it doesn't retry for the issues that I have 🤔
@baszalmstra was https://github.com/conda/rattler/pull/837 in any way related to this issue?
we experience these issues with pixi 0.28.0, haven't tried newer pixi versions yet... i'll check 0.29.0 in the coming days
So i tried it out with 0.29.0 and the errors are still there unfortunately.
Since i didn't see any warning messages (which should be included in the default log level, right?) I'm assuming that pixi is still not retrying on these errors 🤔 https://github.com/conda/rattler/blob/1a463eb5bd17eb6d5a7df1622e258aac09d982e0/crates/rattler_cache/src/package_cache/mod.rs#L241
i got some debug logs with -vv where the issue occurs:
...
WARN rattler_cache::package_cache: failed to download and extract https://conda.anaconda.org/conda-forge/linux-64/libevent-2.1.12-hf998b51_1.conda to /home/runner/.cache/rattler/cache/pkgs/libevent-2.1.12-hf998b51_1: an io error occurred. Retry #1, Sleeping 1.642122237s until the next attempt...
WARN rattler_cache::package_cache: failed to download and extract https://conda.anaconda.org/conda-forge/linux-64/pynacl-1.5.0-py312h98912ed_3.conda to /home/runner/.cache/rattler/cache/pkgs/pynacl-1.5.0-py312h98912ed_3: an io error occurred. Retry #1, Sleeping 777.906958ms until the next attempt...
WARN rattler_cache::package_cache: failed to download and extract https://conda.anaconda.org/conda-forge/linux-64/libstdcxx-14.1.0-hc0a3c3a_1.conda to /home/runner/.cache/rattler/cache/pkgs/libstdcxx-14.1.0-hc0a3c3a_1: an io error occurred. Retry #1, Sleeping 1.765293491s until the next attempt...
WARN rattler_cache::package_cache: failed to download and extract https://conda.anaconda.org/conda-forge/linux-64/tornado-6.4.1-py312h66e93f0_1.conda to /home/runner/.cache/rattler/cache/pkgs/tornado-6.4.1-py312h66e93f0_1: an io error occurred. Retry #1, Sleeping 177.068926ms until the next attempt...
WARN rattler_cache::package_cache: failed to download and extract https://conda.anaconda.org/conda-forge/noarch/hiplot-0.1.33-pyhd8ed1ab_0.tar.bz2 to /home/runner/.cache/rattler/cache/pkgs/hiplot-0.1.33-pyhd8ed1ab_0: an io error occurred. Retry #1, Sleeping 1.763704962s until the next attempt...
WARN rattler_cache::package_cache: failed to download and extract https://conda.anaconda.org/conda-forge/noarch/pip-24.2-pyh8b19718_1.conda to /home/runner/.cache/rattler/cache/pkgs/pip-24.2-pyh8b19718_1: an io error occurred. Retry #1, Sleeping 1.693506324s until the next attempt...
WARN rattler_cache::package_cache: failed to download and extract https://conda.anaconda.org/conda-forge/linux-64/libtiff-4.6.0-h46a8edc_4.conda to /home/runner/.cache/rattler/cache/pkgs/libtiff-4.6.0-h46a8edc_4: an io error occurred. Retry #1, Sleeping 1.720698875s until the next attempt...
WARN rattler_cache::package_cache: failed to download and extract https://conda.anaconda.org/conda-forge/linux-64/graphviz-12.0.0-hba01fac_0.conda to /home/runner/.cache/rattler/cache/pkgs/graphviz-12.0.0-hba01fac_0: an io error occurred. Retry #1, Sleeping 1.004458089s until the next attempt...
WARN rattler_cache::package_cache: failed to download and extract https://conda.anaconda.org/conda-forge/linux-64/pixman-0.43.2-h59595ed_0.conda to /home/runner/.cache/rattler/cache/pkgs/pixman-0.43.2-h59595ed_0: an io error occurred. Retry #1, Sleeping 55.454116ms until the next attempt...
...
the whole ci run failed within 10s; what do you think to do non-concurrent retries?
also @baszalmstra do you think we could make https://github.com/conda/rattler/blob/b44887563c20aa9973da39b7a01eb72c37e09d91/crates/rattler_package_streaming/src/lib.rs#L27-L28 more informative?
@baszalmstra please reopen. I don't think we have this explicit issue anymore because we switched artifactory instance to something more scalable but I still think we can improve the warning messages
I dont feel this is particularly actionable at this point. Since you also dont run into this anymore Im inclinded to leave it closed until someone else runs into this issue.