hyperjson
hyperjson copied to clipboard
Fnv hasher with default capacity
This PR goes on top of https://github.com/mre/hyperjson/pull/43
Changes:
- Initialize
entiteswith default capacity > 0.
This may safe some extra memory allocations.
Benchmarks are again slightly contradictional. Please consider this PR only like an idea. Feel free to reject it if it does not bring any useful improvement
Thanks for your PRs @greyblake. For sure sounds like a nice idea to try. I'm gonna run the benchmarks, but what I would really love to do is running a profiler on the new and the old version to see what changed. Just haven't gotten around to add this to the project (see #38).
So I ran the benchmarks on my machine and the values of this branch are very close to the master branch, well within the standard deviation. That means I can't make any conclusive judgement as to what is the faster version. Maybe others can try to reproduce on their machine?
Here are my benchmarks for Python 3.5, done on my laptop (debian).
master.txt fnv-hasher.txt fnv-hasher-with-default-capacity.txt
Here the result aggregated in one spreadsheet: https://docs.google.com/spreadsheets/d/1vrERpk-QLZYLQOHu8nh6fIAeTdEbNxp5bJEIPc-N-MM/edit?usp=sharing
However, if I run the benchmark multiple times I get different results, so yea, based on this it's hard to judge if this is a real improvement. Would it make sense to increase number of iterations in the benchmarks?
Pretty similar to what I saw in my benchmarks. You can control the number of iterations and rounds with the following parameters:
pipenv run pytest benchmarks --benchmark-min-rounds=BENCHMARK_MIN_ROUNDS --benchmark-warmup-iterations=NUM
The docs are here. I haven't tried that myself, though. 😉
Ok, running the following command:
time pipenv run pytest benchmarks --benchmark-warmup-iterations=100000 --benchmark-min-rounds=100000
Shows to me a more or less reproducible result (I've tried 2 times).
fnv-hasher branch slightly faster than master, and fnv-hasher-with-default-capacity is slightly faster than fnv-hasher. In particular (total time):
master: 2m40.388s
fnv-hasher: 2m38.193s
fnv-hasher-with-default-capacity: 2m37.594s
I'm in the process of setting up a machine for profiling. Have a dedicated Linux box now for that purpose. If anybody has time to profile the code before me, feel free to do that and add some data here.