pyperformance
pyperformance copied to clipboard
Different number for values for python_startup and python_startup_no_site between CPython 2.7 and PyPy 2.7
The problem
I tried to compare the performance results between different Python versions and implementations. While the comparison between CPython 3.6 and CPython 2.7 works as expected, I get an exception when comparing the results obtained with CPython 2.7.13 and PyP 2.7.13.
Exact versions:
CPython:
Python 2.7.13 |Continuum Analytics, Inc.| (default, Dec 20 2016, 23:05:08)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin
PyPy:
(Python 2.7.13 (c925e7381036, Jun 05 2017, 20:53:58)
[PyPy 5.8.0 with GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)]
How to reproduce the issue
Run from CPython 2 environment:
python -m performance run -o py27.json
Run from PyPy environment:
pypy -m performance run -o pypy27.json
Compare:
pyperformance compare -O table py27.json pypy27.json
What you expected to happen
I expected table like this: +-------------------------+-----------+-------------+-----------------+-------------------------+ | Benchmark | py27.json | pypy27.json | Change | Significance | +=========================+===========+=============+=================+=========================+ | 2to3 | 767 ms | 1.63 sec | 2.13x slower | Significant (t=-142.45) | +-------------------------+-----------+-------------+-----------------+-------------------------+ | chaos | 215 ms | 5.58 ms | 38.62x faster | Significant (t=204.35) | +-------------------------+-----------+-------------+-----------------+-------------------------+
What actually happens
I get this exception:
compare.py", line 212, in __init__
raise RuntimeError("base and changed don't have "
RuntimeError: base and changed don't have the same number of values
Note: The line number my have change do to my debug prints.
Cause
The number of values for the benchmarks python_startup
and python_startup_no_site
are different, i.e. 200 for CPython and 60 for PyPy (same numbers for both benchmarks)
My work around
I just skipped python_startup
and python_startup_no_site
with:
if name in ('python_startup', 'python_startup_no_site'):
continue
in compare.compare_results:
def compare_results(options):
base_label, changed_label = get_labels(options.baseline_filename,
options.changed_filename)
base_suite = perf.BenchmarkSuite.load(options.baseline_filename)
changed_suite = perf.BenchmarkSuite.load(options.changed_filename)
results = []
common = set(base_suite.get_benchmark_names()) & set(
changed_suite.get_benchmark_names())
for name in sorted(common):
print(name)
if name in ('python_startup', 'python_startup_no_site'):
continue
base_bench = base_suite.get_benchmark(name)
changed_bench = changed_suite.get_benchmark(name)
result = BenchmarkResult(base_bench, changed_bench)
results.append(result)
Suggested better solution
Either:
- Allow command line argument to explicitly skip comparison of tests.
- Skip non-comparable tests automatically and just list them at the end. Make this optional via a command line switch.
You should be able to use "python3 -m perf compare_to ref.json patch.json" command to compare your two benchmark results.
In perf, a warning is emitted if perf cannot check if the difference is significant. It doesn't crash. We should probably do the same in performance. Or maybe rewrite performance using perf compare?