[WIP] Add FNV1a hash function
Fixes #1757
Test driving the FNV hash. Benchmarks to do:
- [ ] computation time
- [ ] "quality" of the hashing (returning a constant would be a fast hash, but low quality) not sure yet how to measure this.
- [ ] Is it mergeable?
- [ ]
make testDid it pass the tests? - [ ]
make clean diff-coverIf it introduces new functionality inscripts/is it tested? - [ ]
make format diff_pylint_report cppcheck doc pydocstyleIs it well formatted? - [ ] Did it change the command-line interface? Only backwards-compatible additions are allowed without a major version increment. Changing file formats also requires a major version number increment.
- [ ] For substantial changes or changes to the command-line interface, is it
documented in
CHANGELOG.md? See keepachangelog for more details. - [ ] Was a spellchecker run on the source code and documentation after changes were made?
- [ ] Do the changes respect streaming IO? (Are they tested for streaming IO?)
Surprising how few tests fail with this!
nice :).
Results from running this script as a benchmark.
$ python fp.py bf # FNV
kind, load, query, t_load, t_queryNP, t_queryP
bf, 9000000, 1000000, 10.381915807723999, 1.5168020725250244, 1.2786040306091309
bf, 9000000, 1000000, 10.489625930786133, 1.3121399879455566, 1.2331571578979492
bf, 9000000, 1000000, 9.589091062545776, 1.1095459461212158, 1.0996849536895752
bf, 9000000, 1000000, 10.481687068939209, 1.2617030143737793, 1.159147024154663
$ python fp.py bf # murmur
kind, load, query, t_load, t_queryNP, t_queryP
bf, 9000000, 1000000, 10.766052007675171, 1.184602975845337, 1.1206409931182861
bf, 9000000, 1000000, 9.716826915740967, 1.169848918914795, 1.0594408512115479
bf, 9000000, 1000000, 10.398908853530884, 1.231614112854004, 1.2056090831756592
bf, 9000000, 1000000, 11.516963958740234, 1.6280360221862793, 1.1633579730987549
bf, 9000000, 1000000, 10.743386030197144, 1.2828280925750732, 1.1859049797058105
The thing to look at are the last three columns which measure how long how it takes to load kmers, lookup a kmer that isn't present and kmers which are present.
Observations: the benchmark isn't very useful because it is pretty noisy and I'd conclude that FNV1a is no faster than murmurhash. I find that at least somewhat surprising.
Question for all y'all - are we benchmarking with different numbers of hash tables? I'm wondering if the prime number modulus is causing any issues.
I am still stuck on the question of (a) whether things could be faster, (b) if so, how easy it would be to speed things up. Right now we seem to be stuck in the never never land of "it's hard to identify any one factor that is slowing things down."
Codecov Report
Merging #1760 into master will increase coverage by
<.01%. The diff coverage is0%.
@@ Coverage Diff @@
## master #1760 +/- ##
=========================================
+ Coverage 0.05% 0.06% +<.01%
=========================================
Files 87 78 -9
Lines 11441 9807 -1634
Branches 3072 2459 -613
=========================================
Hits 6 6
+ Misses 11435 9801 -1634
| Impacted Files | Coverage Δ | |
|---|---|---|
| include/oxli/hashtable.hh | 0% <ø> (ø) |
:arrow_up: |
| include/oxli/kmer_hash.hh | 0% <ø> (ø) |
:arrow_up: |
| src/oxli/kmer_hash.cc | 0% <0%> (ø) |
:arrow_up: |
| src/oxli/hllcounter.cc | 0% <0%> (ø) |
:arrow_up: |
| scripts/abundance-dist.py | 0% <0%> (ø) |
:arrow_up: |
| scripts/trim-low-abund.py | 0% <0%> (ø) |
:arrow_up: |
| scripts/make-initial-stoptags.py | 0% <0%> (ø) |
:arrow_up: |
| src/oxli/labelhash.cc | 0% <0%> (ø) |
:arrow_up: |
| include/oxli/hashgraph.hh | 0% <0%> (ø) |
:arrow_up: |
| src/khmer/_cpy_khmer.cc | 0% <0%> (ø) |
:arrow_up: |
| ... and 14 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update acdacef...43aaf93. Read the comment docs.