reqwest-middleware
reqwest-middleware copied to clipboard
Marking additional `std::io::ErrorKind` variants as transient (Cloudflare bad TLS packets)
Motivations
We were investigating flakiness with Cloudflare requests that already had a generous retry limit, but were flagged as Fatal by the default policy.
As it turns out, one of the errors looked like:
reqwest::Error {
kind: Request,
url: Url { ... },
source: hyper_util::client::legacy::Error(SendRequest, hyper::Error(Io, Custom { kind: InvalidData, error: "received fatal alert: BadRecordMac" }))
}
There are various reports of this BadRecordMac
(rustls) or ERR_SSL_BAD_RECORD_MAC_ALERT
(openssl) when using Cloudflare.
Retrying mitigates the issue, but since it's considered Fatal instead of Transient, the request fails.
Solution
Update classify_io_error
to mark this error as transient.
fn classify_io_error(error: &std::io::Error) -> Retryable {
match error.kind() {
- std::io::ErrorKind::ConnectionReset | std::io::ErrorKind::ConnectionAborted => {
+ std::io::ErrorKind::ConnectionReset | std::io::ErrorKind::ConnectionAborted | std::io::ErrorKind::InvalidData => {
Retryable::Transient
}
_ => Retryable::Fatal,
}
}
Alternatives
Consider even more variants to be marked as transient. I haven't investigated all of them, but some that might be transient from their description:
- https://doc.rust-lang.org/std/io/enum.ErrorKind.html#variant.BrokenPipe
- https://doc.rust-lang.org/std/io/enum.ErrorKind.html#variant.TimedOut
- https://doc.rust-lang.org/std/io/enum.ErrorKind.html#variant.Interrupted
Additional context
Tested with
- reqwest-retry 0.7.0
- reqwest-middleware 0.4.0
- reqwest 0.12.4 (including
rustls-tls-native-roots
) - hyper 1.3.1