pixi icon indicating copy to clipboard operation
pixi copied to clipboard

Add retries for package installations

Open pavelzw opened this issue 1 year ago • 10 comments

Problem description

In fragile networks with higher package loss, we often run into issues like the following:

× failed to fetch msgpack-python-1.0.8-py311h52f7536_0.conda
    ├─▶ error sending request for url (https://my.conda.mirror.com/
    │   artifactory/conda-forge/linux-64/msgpack-python-1.0.8-
    │   py311h52f7536_0.conda)
    ├─▶ client error (SendRequest)
    ├─▶ connection error
    ╰─▶ bytes remaining on stream

or

    × failed to fetch custom.package-1.0.0-1-.conda
    ├─▶ error sending request for url (https://my.conda.mirror.com/
    │   artifactory/conda-custom-channel/noarch/custom.package-1.0.0-1-
    │   .conda)
    ├─▶ client error (SendRequest)
    ├─▶ connection error
    ╰─▶ peer closed connection without sending TLS close_notify: https://
        docs.rs/rustls/latest/rustls/manual/_03_howto/index.html#unexpected-eof

This is most likely due to networking issues on the first download. It would be nice if pixi tried multiple times and only fail after the 5th time or so. Maybe also configurable in the global config?

pavelzw avatar Sep 06 '24 12:09 pavelzw

To be sure, this is upon pixi install or something along those lines? In general, we used to have a retry client. It's possible that it was lost during refactor.

The retry client does only retry on certain network errors (e.g. 50x I think). It could be that certain network errors aren't retried because it assumes something bigger is faulty :D

wolfv avatar Sep 06 '24 12:09 wolfv

Yes, this is happening during pixi install --locked

pavelzw avatar Sep 06 '24 12:09 pavelzw

Is there any way to see retroactively if pixi retried or not? Maybe it might makes sense if pixi wrote a warning to stderr 🤔

pavelzw avatar Sep 06 '24 12:09 pavelzw

This is the one we're using: https://docs.rs/reqwest-retry/latest/reqwest_retry/

I do think we can customize the function in order to add logging.

wolfv avatar Sep 06 '24 13:09 wolfv

From what I can see, pixi install should do retries on some error types. What's interesting is that in all my failures, I haven't seen WARN failed to download and extract ... in my logs (which should be there in the default log level)

https://github.com/conda/rattler/blob/4885895f8af18321a50b8c376c5d42231a3d3743/crates/rattler_cache/src/package_cache/mod.rs#L241

So it seems to me that it doesn't retry for the issues that I have 🤔

pavelzw avatar Sep 10 '24 13:09 pavelzw

@baszalmstra was https://github.com/conda/rattler/pull/837 in any way related to this issue?

pavelzw avatar Sep 10 '24 13:09 pavelzw

we experience these issues with pixi 0.28.0, haven't tried newer pixi versions yet... i'll check 0.29.0 in the coming days

pavelzw avatar Sep 10 '24 13:09 pavelzw

So i tried it out with 0.29.0 and the errors are still there unfortunately.

image image image

Since i didn't see any warning messages (which should be included in the default log level, right?) I'm assuming that pixi is still not retrying on these errors 🤔 https://github.com/conda/rattler/blob/1a463eb5bd17eb6d5a7df1622e258aac09d982e0/crates/rattler_cache/src/package_cache/mod.rs#L241

pavelzw avatar Sep 26 '24 09:09 pavelzw

i got some debug logs with -vv where the issue occurs:

...
   WARN rattler_cache::package_cache: failed to download and extract https://conda.anaconda.org/conda-forge/linux-64/libevent-2.1.12-hf998b51_1.conda to /home/runner/.cache/rattler/cache/pkgs/libevent-2.1.12-hf998b51_1: an io error occurred. Retry #1, Sleeping 1.642122237s until the next attempt...
   WARN rattler_cache::package_cache: failed to download and extract https://conda.anaconda.org/conda-forge/linux-64/pynacl-1.5.0-py312h98912ed_3.conda to /home/runner/.cache/rattler/cache/pkgs/pynacl-1.5.0-py312h98912ed_3: an io error occurred. Retry #1, Sleeping 777.906958ms until the next attempt...
   WARN rattler_cache::package_cache: failed to download and extract https://conda.anaconda.org/conda-forge/linux-64/libstdcxx-14.1.0-hc0a3c3a_1.conda to /home/runner/.cache/rattler/cache/pkgs/libstdcxx-14.1.0-hc0a3c3a_1: an io error occurred. Retry #1, Sleeping 1.765293491s until the next attempt...
   WARN rattler_cache::package_cache: failed to download and extract https://conda.anaconda.org/conda-forge/linux-64/tornado-6.4.1-py312h66e93f0_1.conda to /home/runner/.cache/rattler/cache/pkgs/tornado-6.4.1-py312h66e93f0_1: an io error occurred. Retry #1, Sleeping 177.068926ms until the next attempt...
   WARN rattler_cache::package_cache: failed to download and extract https://conda.anaconda.org/conda-forge/noarch/hiplot-0.1.33-pyhd8ed1ab_0.tar.bz2 to /home/runner/.cache/rattler/cache/pkgs/hiplot-0.1.33-pyhd8ed1ab_0: an io error occurred. Retry #1, Sleeping 1.763704962s until the next attempt...
   WARN rattler_cache::package_cache: failed to download and extract https://conda.anaconda.org/conda-forge/noarch/pip-24.2-pyh8b19718_1.conda to /home/runner/.cache/rattler/cache/pkgs/pip-24.2-pyh8b19718_1: an io error occurred. Retry #1, Sleeping 1.693506324s until the next attempt...
   WARN rattler_cache::package_cache: failed to download and extract https://conda.anaconda.org/conda-forge/linux-64/libtiff-4.6.0-h46a8edc_4.conda to /home/runner/.cache/rattler/cache/pkgs/libtiff-4.6.0-h46a8edc_4: an io error occurred. Retry #1, Sleeping 1.720698875s until the next attempt...
   WARN rattler_cache::package_cache: failed to download and extract https://conda.anaconda.org/conda-forge/linux-64/graphviz-12.0.0-hba01fac_0.conda to /home/runner/.cache/rattler/cache/pkgs/graphviz-12.0.0-hba01fac_0: an io error occurred. Retry #1, Sleeping 1.004458089s until the next attempt...
   WARN rattler_cache::package_cache: failed to download and extract https://conda.anaconda.org/conda-forge/linux-64/pixman-0.43.2-h59595ed_0.conda to /home/runner/.cache/rattler/cache/pkgs/pixman-0.43.2-h59595ed_0: an io error occurred. Retry #1, Sleeping 55.454116ms until the next attempt...
...

the whole ci run failed within 10s; what do you think to do non-concurrent retries?

pavelzw avatar Oct 03 '24 13:10 pavelzw

also @baszalmstra do you think we could make https://github.com/conda/rattler/blob/b44887563c20aa9973da39b7a01eb72c37e09d91/crates/rattler_package_streaming/src/lib.rs#L27-L28 more informative?

pavelzw avatar Oct 03 '24 13:10 pavelzw

@baszalmstra please reopen. I don't think we have this explicit issue anymore because we switched artifactory instance to something more scalable but I still think we can improve the warning messages

pavelzw avatar Oct 02 '25 07:10 pavelzw

I dont feel this is particularly actionable at this point. Since you also dont run into this anymore Im inclinded to leave it closed until someone else runs into this issue.

baszalmstra avatar Oct 02 '25 07:10 baszalmstra