Don't spend an hour on crates that timeout during the build
Currently if we have a build that hits the 15 minute timeout (e.g. [email protected]) we will end up spending an hour total attempting to build the release. We do 4 builds total, each with the full timeout available to them:
- With the crates
Cargo.lock: a. Generate coverage data b. Build docs - After deleting the lock: a. Generate coverage data b. Build docs
It would be better to only generate the coverage data if the build succeeded, so we would only attempt builds 1.b. and 2.b. before deciding it timed out, but that has issues
https://github.com/rust-lang/docs.rs/blob/2e5ef9b6d8f13b527436893a9a5e3e67019d5fb3/src/docbuilder/rustwide_builder.rs#L658-L660
One idea would be to skip the subsequent steps if one fails because of a timeout rather than a build error. It seems unlikely that unlocking the crate will turn a timeout into a successful build, or that rustdoc --show-coverage would somehow be the cause of a timeout rather than it being one of the dependencies.
I remember that the whole build-attempt topic predates most (all?) of us.
To ask the "real" question: what kind of errors are actually transient, so would be solved by just trying again?
I imagine network errors, but these won't happen in the docker container itself, but only outside of in in our builder.
So what remains?
This isn't actually related to retrying the build attempts, these are 4 invocations of cargo rustdoc from within the same build-attempt.
In this case it is actually possible that the 4th will succeed in very niche cases; the first two builds might timeout because of a locked dependency that times out, the third build might hit a bug in rustdoc specific to --show-coverage that causes it to timeout, then the 4th build finally succeeds and gives us docs. I think that is unlikely enough that we shouldn't spend the extra build time on supporting it.
I see there is still very much I don't know about the build process :)
So in extreme cases we would 4 times 4 tries?
No, in the end it 'successfully fails' so we don't increment the attempt counter and retry.
Ah, you're right.