docs.rs icon indicating copy to clipboard operation
docs.rs copied to clipboard

Don't spend an hour on crates that timeout during the build

Open Nemo157 opened this issue 3 years ago • 5 comments

Currently if we have a build that hits the 15 minute timeout (e.g. [email protected]) we will end up spending an hour total attempting to build the release. We do 4 builds total, each with the full timeout available to them:

  1. With the crates Cargo.lock: a. Generate coverage data b. Build docs
  2. After deleting the lock: a. Generate coverage data b. Build docs

It would be better to only generate the coverage data if the build succeeded, so we would only attempt builds 1.b. and 2.b. before deciding it timed out, but that has issues

https://github.com/rust-lang/docs.rs/blob/2e5ef9b6d8f13b527436893a9a5e3e67019d5fb3/src/docbuilder/rustwide_builder.rs#L658-L660

One idea would be to skip the subsequent steps if one fails because of a timeout rather than a build error. It seems unlikely that unlocking the crate will turn a timeout into a successful build, or that rustdoc --show-coverage would somehow be the cause of a timeout rather than it being one of the dependencies.

Nemo157 avatar Nov 15 '22 21:11 Nemo157

I remember that the whole build-attempt topic predates most (all?) of us.

To ask the "real" question: what kind of errors are actually transient, so would be solved by just trying again?

I imagine network errors, but these won't happen in the docker container itself, but only outside of in in our builder.

So what remains?

syphar avatar Nov 16 '22 15:11 syphar

This isn't actually related to retrying the build attempts, these are 4 invocations of cargo rustdoc from within the same build-attempt.

In this case it is actually possible that the 4th will succeed in very niche cases; the first two builds might timeout because of a locked dependency that times out, the third build might hit a bug in rustdoc specific to --show-coverage that causes it to timeout, then the 4th build finally succeeds and gives us docs. I think that is unlikely enough that we shouldn't spend the extra build time on supporting it.

Nemo157 avatar Nov 16 '22 15:11 Nemo157

I see there is still very much I don't know about the build process :)

So in extreme cases we would 4 times 4 tries?

syphar avatar Nov 16 '22 17:11 syphar

No, in the end it 'successfully fails' so we don't increment the attempt counter and retry.

Nemo157 avatar Nov 16 '22 17:11 Nemo157

Ah, you're right.

syphar avatar Nov 16 '22 18:11 syphar