pyperformance
pyperformance copied to clipboard
Adjust benchmarks for PyPy
On PyPy, I don't think that performance currently measures correctly performance after the code has been optimized by the JIT compiler. I started a thread on the PyPy mailing list: https://mail.python.org/pipermail/pypy-dev/2017-April/015101.html
perf 1.2 adds a new "warmup calibration" feature, but it seems unstable. I will probably try to compute manually the number of required warmups on PyPy 5.7 on the speed-python server, and use hardcoded values, instead.
I agree the warmup number should be fixed. It will reduce uncertainty about what a benchmark is measuring.
IMO there should be versioning of each benchmark so we can be sure result comparisons are apples-to-apples. On the speed.python.org website it should only show the latest version of each benchmark, which would ensure changing the warmup number would not cause a weird jump in the timeline display
I would like to begin testing PyPy3.6 and uploading results to speed.python.org, but there should be agreement that PyPy results are valid and meaningful. Do we require warmup values or just an upper limit for number of loops before we give up on convergance?
Currently there are warmup values for bm_go, bm_hexiom, and bm_tornado_http. Following the note about warmup, can/should we add values for the other benchmarks? Since the times sometimes do not converge to a single value, we should have a maximum number of loops as well.
Something should be done, I don't know what.