spurious network errors while running CI for Solaris on vmactions/solaris-vm
Problem
We have started to see following network errors:
warning: spurious network error (3 tries remaining): [28] Timeout was reached (Operation too slow. Less than 10 bytes/sec transferred the last 30 seconds)
as reported here: https://github.com/quinn-rs/quinn/issues/2218
Note that it's running Solaris under QEMU in Linux. I guess that there might have been some changes to QEMU and/or Linux underneeth which enabled these problems. Unfortunatelly main victim seems to be Solaris.
I have come with following workaround. But the cost seems to be longer times...
https://github.com/quinn-rs/quinn/pull/2243
Hmm… so 1.86.0 has caused some issues? We did bump curl to from 8.9.0 to 8.12.0 in Rust 1.86: https://github.com/rust-lang/cargo/pull/15162. I wonder if it is possible for you to downgrade to old curl and see what is going on.
This environment could also be set to show more useful debugging info:
CARGO_HTTP_DEBUG=true CARGO_LOG=cargo::ops::registry=debug cargo fetch
Hmm… so 1.86.0 has caused some issues? We did bump curl to from 8.9.0 to 8.12.0 in Rust 1.86: #15162. I wonder if it is possible for you to downgrade to old curl and see what is going on.
I don't think it's related to cargo version. I have tried version 1.82 and it had the same problems. But originally when Solaris CI was integrated it used this version (1.82) for some time without issue.
Workaround seems to be to set CARGO_HTTP_MULTIPLEXING=false.
Workaround seems to be to set
CARGO_HTTP_MULTIPLEXING=false.
Cargo already has some built-in workaround for disabling multiplexing under some circumstances. Like this and this. However, due to the unusual interaction (Solaris under QEMU), not sure if we'd like to have a similar approach built-in before knowing the root cause.
This environment could also be set to show more useful debugging info:
CARGO_HTTP_DEBUG=true CARGO_LOG=cargo::ops::registry=debug cargo fetch
I don't see any debug info with this:
CARGO_HTTP_DEBUG=true CARGO_LOG=cargo::ops::registry=debug cargo fetch
cargo build --locked --all-targets && cargo test --locked --manifest-path fuzz/Cargo.toml && cargo test --locked -p quinn-udp --benches
/usr/bin/bash /home/runner/work/_actions/vmactions/solaris-vm/v1/run.sh execSSHSH
Config file: solaris-11.4-gcc.conf
Updating crates.io index
Downloading crates ...
Downloaded windows-targets v0.52.6
Downloaded untrusted v0.9.0
Downloaded subtle v2.6.1
Downloaded write16 v1.0.0
Downloaded icu_provider_macros v1.5.0
warning: spurious network error (3 tries remaining): [18] Transferred a partial file (transfer closed with 229231 bytes remaining to read)
warning: spurious network error (3 tries remaining): [18] Transferred a partial file (transfer closed with 29962 bytes remaining to read)
warning: spurious network error (3 tries remaining): [18] Transferred a partial file (transfer closed with 47660 bytes remaining to read)
warning: spurious network error (3 tries remaining): [18] Transferred a partial file (transfer closed with 52722 bytes remaining to read)
warning: spurious network error (3 tries remaining): [18] Transferred a partial file (transfer closed with 16971 bytes remaining to read)
warning: spurious network error (3 tries remaining): [18] Transferred a partial file (transfer closed with 7525 bytes remaining to read)
warning: spurious network error (3 tries remaining): [18] Transferred a partial file (transfer closed with 29673 bytes remaining to read)
warning: spurious network error (3 tries remaining): [18] Transferred a partial file (transfer closed with 18327 bytes remaining to read)
warning: spurious network error (3 tries remaining): [28] Timeout was reached (download of `zeroize v1.8.1` failed to transfer more than 1 bytes in 90s)
warning: spurious network error (3 tries remaining): [28] Timeout was reached (failed to download any data for `rustls v0.23.27` within 90s)
Ok I now see it also on my VirtualBox. So I don't think it's QEMU related.
test@virt75:~/quinn-ci-test2$ CARGO_HTTP_DEBUG=true CARGO_LOG=cargo::ops::registry=debug cargo fetch
Updating crates.io index
Downloaded wasm-bindgen-macro v0.2.100
Downloaded subtle v2.6.1
Downloaded unicode-ident v1.0.18
Downloaded bumpalo v3.17.0
Downloaded wasm-bindgen-backend v0.2.100
Downloaded rustls-webpki v0.103.3
Downloaded synstructure v0.13.1
Downloaded zeroize v1.8.1
Downloaded wasm-bindgen-macro-support v0.2.100
Downloaded pin-project v1.1.10
warning: spurious network error (3 tries remaining): [18] Transferred a partial file (transfer closed with 8869 bytes remaining to read)
Downloaded matchers v0.1.0
Downloaded web-time v1.1.0
Downloaded js-sys v0.3.77
Downloaded rustversion v1.0.20
Downloaded quote v1.0.40
Downloaded value-bag v1.11.1
Downloaded signal-hook-registry v1.4.2
Downloaded rustls-pki-types v1.11.0
Downloaded windows-targets v0.52.6
Downloaded rustls v0.23.27
Downloaded untrusted v0.9.0
Downloaded wasm-bindgen v0.2.100
Downloaded thiserror-impl v2.0.12
Downloaded adler2 v2.0.0
Downloaded tracing-futures v0.2.5
warning: spurious network error (3 tries remaining): [16] Error in the HTTP2 framing layer
warning: spurious network error (3 tries remaining): [18] Transferred a partial file (transfer closed with 27991 bytes remaining to read)
warning: spurious network error (3 tries remaining): [18] Transferred a partial file (transfer closed with 16310 bytes remaining to read)
warning: spurious network error (3 tries remaining): [18] Transferred a partial file (transfer closed with 58239 bytes remaining to read)
Downloaded cfg-if v1.0.0
warning: spurious network error (3 tries remaining): [18] Transferred a partial file (transfer closed with 49252 bytes remaining to read)
warning: spurious network error (3 tries remaining): [18] Transferred a partial file (transfer closed with 34446 bytes remaining to read)
warning: spurious network error (3 tries remaining): [18] Transferred a partial file (transfer closed with 10223 bytes remaining to read)
warning: spurious network error (3 tries remaining): [18] Transferred a partial file (transfer closed with 16970 bytes remaining to read)
warning: spurious network error (3 tries remaining): [18] Transferred a partial file (transfer closed with 42691 bytes remaining to read)
warning: spurious network error (3 tries remaining): [18] Transferred a partial file (transfer closed with 22654 bytes remaining to read)
Downloaded wasm-bindgen-shared v0.2.100
Downloaded shlex v1.3.0
error: failed to verify the checksum of `shlex v1.3.0`
Maybe it can be related to old CURL 7.79.0 (Old Solaris CBE release 11.4.42). While on baremetal with latest Solaris system (CURL 8.12.0) I don't see it...
The aforementioned patch looks like this:
https://github.com/rust-lang/cargo/blob/47c911e9e6f6461f90ce19142031fe16876a3b95/src/cargo/util/context/mod.rs?plain=1#L3117-L3143
We could do stuff once we know the pattern of the failure. (sorry I don't have those machines so can only rely on people to test it 😞)
Ok, this can be fixed also via:
--- rustc-1.86.0-src/src/tools/cargo/src/cargo/util/context/mod.rs
+++ rustc-1.86.0-src/src/tools/cargo/src/cargo/util/context/mod.rs
@@ -3020,7 +3020,7 @@
use crate::util::network;
if network::proxy::http_proxy_exists(http, gctx) && http.multiplexing.is_none() {
- let bad_curl_versions = ["7.87.0", "7.88.0", "7.88.1"];
+ let bad_curl_versions = ["7.79.0", "7.87.0", "7.88.0", "7.88.1"];
if bad_curl_versions
.iter()
.any(|v| curl_version.starts_with(v))
Note that CURL is 7.79.0 is bundled with older version of Oracle Solaris CBE 11.4.42 which is still in use with Github vmactions/solaris-vm.
This environment could also be set to show more useful debugging info:
CARGO_HTTP_DEBUG=true CARGO_LOG=cargo::ops::registry=debug cargo fetchI don't see any debug info with this:
Sorry should be something lke
CARGO_HTTP_DEBUG=true CARGO_LOG=network=debug cargo fetch
Sorry should be something lke
CARGO_HTTP_DEBUG=true CARGO_LOG=network=debug cargo fetch
Thank you! Now there is too much data...
You probably want to disable libcurl low speed limit in any case with CARGO_HTTP_LOW_SPEED_LIMIT=0. Default is 10 which means if less than 10 bytes are downloaded per timeout (30 seconds), connection is aborted. It is enough to have a 30-second timeout, if nothing is read in 30 seconds then you abort the connection. This libcurl feature is a frequent source of intermittent failures and it's rare that reconnecting makes the speed higher. I don't like that it is configured by default.
You will likely still get timeouts if connection got stuck completely, but at least you will know that you have not received anything at all for 30 seconds.