bazel
bazel copied to clipboard
Incorrect estimate of available memory with --experimental_local_memory_estimate
Description of the problem / feature request:
--experimental_local_memory_estimate seems to produce an incorrect estimate of available memory, causing builds to be delayed by many minutes.
Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
When checking where --experimental_local_memory_estimate is evaluated, it assigns
double reserveMemory = staticResources.getMemoryMb();
and then computes
remainingRam = totalFreeRam - reserveMemory;
As staticResources appears to be a copy of availableResources (both are set in setAvailableResources; availableResources is public but not manipulated elsewhere as far as I can tell), I believe that the computation should be this:
remainingRam = Math.min(remainingRam, totalFreeRam);
If my interpretation is correct, assigning more memory to Bazel would actually leave less remaining memory because all the assigned memory is being treated as reserved elsewhere.
https://github.com/bazelbuild/bazel/blob/fa9fabb3f699f1e8e8ebbe22228fc4acbc9df2ba/src/main/java/com/google/devtools/build/lib/actions/ResourceManager.java#L389-L392
What operating system are you running Bazel on?
LSB Version: :core-4.1-amd64:core-4.1-noarch
Distributor ID: Amazon
Description: Amazon Linux release 2 (Karoo)
Release: 2
Codename: Karoo
What's the output of bazel info release?
INFO: Invocation ID: 2a8b8b1f-918a-4d50-bcfb-fdf655a6dabe
release 4.0.0
I would probably use the following to reduce the chance of out-of-memory errors:
remainingRam = Math.min(remainingRam, 0.95 * totalFreeRam);
@ulfjack I think you added this flag. Any thoughts?
Possibly the bug was introduced in this commit @susinmotion ?
https://github.com/bazelbuild/bazel/commit/7f530c88cf3b66b7ff0c585f1cedf401e01138f0#diff-95029ccab6cd5d02bb23eb7c648ed9b57da4f02f21dc95a903aed959a28754caR390
- double reserveMemory =
- staticResources.getMemoryMb() * (100.0 - this.ramUtilizationPercentage) / 100.0;
+ double reserveMemory = staticResources.getMemoryMb();
remainingRam = totalFreeRam - reserveMemory;
I think the previous logic tried to achieve something like:
memoryNotAvailableToBazel = totalMachineMemory - staticResources.getMemoryMb()
remainingRam = totalFreeRam - memoryNotAvailableToBazel
Thank you a lot for these notices! I will change a logic in this code to remove the old incorrect code.
Per this thread in the bazel slack, it appears that --experimental_local_memory_estimate causes Bazel to execute no more than one local action at a time (as observed with ps). Could the incorrect estimate reported here be responsible for this behavior?
FYI, 7.0.0-pre contains https://github.com/bazelbuild/bazel/commit/361ce673ad2b959b73e859a121ff2e996feb561b which removes this option altogether