hyperjson icon indicating copy to clipboard operation
hyperjson copied to clipboard

Fnv hasher with default capacity

Open greyblake opened this issue 7 years ago • 6 comments

This PR goes on top of https://github.com/mre/hyperjson/pull/43

Changes:

  • Initialize entites with default capacity > 0.

This may safe some extra memory allocations.

Benchmarks are again slightly contradictional. Please consider this PR only like an idea. Feel free to reject it if it does not bring any useful improvement

greyblake avatar Sep 21 '18 18:09 greyblake

Thanks for your PRs @greyblake. For sure sounds like a nice idea to try. I'm gonna run the benchmarks, but what I would really love to do is running a profiler on the new and the old version to see what changed. Just haven't gotten around to add this to the project (see #38).

mre avatar Sep 21 '18 21:09 mre

So I ran the benchmarks on my machine and the values of this branch are very close to the master branch, well within the standard deviation. That means I can't make any conclusive judgement as to what is the faster version. Maybe others can try to reproduce on their machine?

main.txt fnv-bench.txt

mre avatar Sep 21 '18 21:09 mre

Here are my benchmarks for Python 3.5, done on my laptop (debian).

master.txt fnv-hasher.txt fnv-hasher-with-default-capacity.txt

Here the result aggregated in one spreadsheet: https://docs.google.com/spreadsheets/d/1vrERpk-QLZYLQOHu8nh6fIAeTdEbNxp5bJEIPc-N-MM/edit?usp=sharing

However, if I run the benchmark multiple times I get different results, so yea, based on this it's hard to judge if this is a real improvement. Would it make sense to increase number of iterations in the benchmarks?

greyblake avatar Sep 22 '18 14:09 greyblake

Pretty similar to what I saw in my benchmarks. You can control the number of iterations and rounds with the following parameters:

pipenv run pytest benchmarks --benchmark-min-rounds=BENCHMARK_MIN_ROUNDS --benchmark-warmup-iterations=NUM

The docs are here. I haven't tried that myself, though. 😉

mre avatar Sep 22 '18 15:09 mre

Ok, running the following command:

time pipenv run pytest benchmarks  --benchmark-warmup-iterations=100000 --benchmark-min-rounds=100000

Shows to me a more or less reproducible result (I've tried 2 times). fnv-hasher branch slightly faster than master, and fnv-hasher-with-default-capacity is slightly faster than fnv-hasher. In particular (total time):

master: 	                  2m40.388s
fnv-hasher:                       2m38.193s
fnv-hasher-with-default-capacity: 2m37.594s

greyblake avatar Sep 22 '18 18:09 greyblake

I'm in the process of setting up a machine for profiling. Have a dedicated Linux box now for that purpose. If anybody has time to profile the code before me, feel free to do that and add some data here.

mre avatar Nov 10 '18 16:11 mre