rushstack icon indicating copy to clipboard operation
rushstack copied to clipboard

[rush] unassigned operations can ignore weighting constraints

Open aramissennyeydd opened this issue 1 year ago • 0 comments

Summary

We're dealing with some build cache inconsistencies across cobuild agents. We're seeing agents using the same lock, but unable to restore completed state as they don't have the same build cache ID. This bug is allowing 2 expensive operations to run side-by-side causing memory issues and timeouts related to memory pressure.

Details

From what I can tell, the unassigned operation needs to have a weight matching the possible operation it will pick up as https://github.com/microsoft/rushstack/blob/c1effc398416068b7f276bac32044f775643295d/libraries/rush-lib/src/logic/operations/OperationExecutionManager.ts#L265 can start an operation that it reasons was not completed on another agent that locked it initially in the CacheableOperationPlugin, https://github.com/microsoft/rushstack/blob/c1effc398416068b7f276bac32044f775643295d/libraries/rush-lib/src/logic/operations/CacheableOperationPlugin.ts#L388-L398. If the weight is not reflected onto the unassigned operation, multiple operations can be picked up and will skip the weight check. For example, say you have 3 operations with parallelism set to 8 and 2 machines:

  1. Operation A with weight 8
  2. Operation B with weight 8
  3. Operation C with weight 4

Start: Machine 1 picks up Operation A, Machine 2 picks up Operation B Step 1: Machine 1 finishes but fails to mark complete Operation A, Machine 2 finishes Operation B Step 2: Machine 2 picks up Operation A and Operation C

The finishes but fails to mark complete is possible with a build cache ID inconsistency (what we're seeing) or if a machine gets lost during execution and doesn't report its success state, so another machine picks up the operation.

How it was tested

Ideas welcome on how to consistently test this with a cobuild setup, I added a unit test for the weight with unassigned operation.

Impacted documentation

None.

aramissennyeydd avatar Jun 28 '24 16:06 aramissennyeydd