docs.rs icon indicating copy to clipboard operation
docs.rs copied to clipboard

Change timeout to apply to the overall build process, not per-build

Open Nemo157 opened this issue 2 years ago • 7 comments

fixes #1910

Nemo157 avatar Aug 07 '23 14:08 Nemo157

Mostly a draft because I want more discussion on the idea, this would drastically reduce the total amount of time crates can consume, from 5.5 hours (15 minutes per build, 2 builds per target (with/without --show-coverage), 10 targets, + default-target rebuild after deleting Cargo.lock) down to 15 minutes if we don't change the default limit.

Nemo157 avatar Aug 07 '23 14:08 Nemo157

One other case this potentially impacts a lot is when a crate is published with an incorrect lockfile, to get any docs the failure with the old lockfile and then the successful build with the new lockfile will have to complete together within the timeout.

I want to add more detailed logging and more visible metrics around this to the /crate/.../builds pages, so authors may improve their crates configuration, but would prefer to delay that till after the work I'm trying to get completed soon on #1011.

Nemo157 avatar Aug 07 '23 19:08 Nemo157

when we would deploy this, we wouln't have any idea how many crate builds we would break.

I assume #2186 would help a little, but only for the builds in the time observed. Same as #1011 when we record more information about the build.

Do we have other ways to optimize? For example, do we try a broken lockfile for each target?

syphar avatar Aug 08 '23 05:08 syphar

For example, do we try a broken lockfile for each target?

Nope, only on the default target, when the default target succeeds we do all the extra-targets with whichever lockfile was successful.

Nemo157 avatar Aug 08 '23 09:08 Nemo157

Without more data this change feels quite risky, what do you think about waiting for the results of #2186 and then deciding?

syphar avatar Aug 09 '23 06:08 syphar

Sounds good.

Nemo157 avatar Aug 09 '23 09:08 Nemo157

Some stats from the last week:

image

The orange line is 15 minutes, the red line is 30 minutes. We did have the 99th percentile exceed 30 minutes briefly, but I think it's prettty safe as an overall timeout if it'll just cause some targets to be missing.

Nemo157 avatar Aug 27 '23 19:08 Nemo157