renaissance
renaissance copied to clipboard
Philosophers: Inverse scalability "problem"
We have been studying the performance of philosophers
on large machines, and realized that the number of CPUs on the machine selects the number of philosophers in the benchmark.
This means that machines that run different number of CPUs run different workloads, misleading the cross-hardware comparisons. AFAICS, this is not what usual benchmarks do: in most benchmarks, higher available hardware parallelism performs globally same amount of work, either showing improvement due to parallelism, or degradation due to contention. In philosophers
, adding hardware parallelism just makes benchmark slower, because the global amount of work is larger, on top of usual contention effects.
The easy way to demonstrate this is overriding -XX:ActiveProcessorCount=#
on a large 64-core machine:
$ shipilev-jdk21u-dev/build/linux-aarch64-server-release/images/jdk/bin/java -Xmx4g -Xms4g -XX:+AlwaysPreTouch -XX:+UnlockDiagnosticVMOptions -XX:ActiveProcessorCount=... -jar renaissance-jmh-0.15.0.jar Philosophers -f 5 -wi 5 -i 5
ActiveProcessorCount=1: 230.081 ± 12.516 ms/op
ActiveProcessorCount=2: 1570.336 ± 75.888 ms/op
ActiveProcessorCount=4: 1893.643 ± 85.768 ms/op
ActiveProcessorCount=8: 2466.867 ± 114.564 ms/op
ActiveProcessorCount=16: 3374.587 ± 182.243 ms/op
ActiveProcessorCount=32: 5097.616 ± 330.096 ms/op
ActiveProcessorCount=64: 10788.201 ± 1470.015 ms/op
(The benchmark also trashes hard when all CPUs are busy, but I think that is just a way it works.)
I don't have a good solution for this, except maybe setting the number of philosophers at some fixed value.