Change timeout to apply to the overall build process, not per-build
fixes #1910
Mostly a draft because I want more discussion on the idea, this would drastically reduce the total amount of time crates can consume, from 5.5 hours (15 minutes per build, 2 builds per target (with/without --show-coverage), 10 targets, + default-target rebuild after deleting Cargo.lock) down to 15 minutes if we don't change the default limit.
One other case this potentially impacts a lot is when a crate is published with an incorrect lockfile, to get any docs the failure with the old lockfile and then the successful build with the new lockfile will have to complete together within the timeout.
I want to add more detailed logging and more visible metrics around this to the /crate/.../builds pages, so authors may improve their crates configuration, but would prefer to delay that till after the work I'm trying to get completed soon on #1011.
when we would deploy this, we wouln't have any idea how many crate builds we would break.
I assume #2186 would help a little, but only for the builds in the time observed. Same as #1011 when we record more information about the build.
Do we have other ways to optimize? For example, do we try a broken lockfile for each target?
For example, do we try a broken lockfile for each target?
Nope, only on the default target, when the default target succeeds we do all the extra-targets with whichever lockfile was successful.
Without more data this change feels quite risky, what do you think about waiting for the results of #2186 and then deciding?
Sounds good.
Some stats from the last week:
The orange line is 15 minutes, the red line is 30 minutes. We did have the 99th percentile exceed 30 minutes briefly, but I think it's prettty safe as an overall timeout if it'll just cause some targets to be missing.