ByteData abstraction impacts performance
I was thinking about rehashing cost in ByteArrayOrdinalMap when running a cycle, and I got curious about whether the hash itself was a bottleneck. So I threw together a quick scratch benchmark (https://github.com/DanielThomas/hollow/tree/murmur3) to see what's what.
The good news is that Yonik Seeley's port used by Hollow HashCodes (hash_hollow) is the fastest I could find (used in Solr), but it appears that the ByteData abstraction, i.e. it's per-byte reads, significantly hurts throughput. The only difference between these benchmarks is one reads bytes using com.netflix.hollow.core.memory.ArrayByteData:
Benchmark (length) Mode Cnt Score Error Units
Murmur3Benchmark.hash_hollow 1 thrpt 5 136862.511 ± 11350.094 ops/ms
Murmur3Benchmark.hash_hollow 10 thrpt 5 81524.319 ± 2689.955 ops/ms
Murmur3Benchmark.hash_hollow 100 thrpt 5 17258.054 ± 302.410 ops/ms
Murmur3Benchmark.hash_hollow 1000 thrpt 5 1898.878 ± 54.612 ops/ms
Murmur3Benchmark.hash_hollow 10000 thrpt 5 194.730 ± 6.152 ops/ms
Murmur3Benchmark.hash_hollow 100000 thrpt 5 19.450 ± 0.537 ops/ms
Murmur3Benchmark.hash_yonik 1 thrpt 5 231423.849 ± 5818.267 ops/ms
Murmur3Benchmark.hash_yonik 10 thrpt 5 113567.630 ± 341.959 ops/ms
Murmur3Benchmark.hash_yonik 100 thrpt 5 22265.668 ± 174.804 ops/ms
Murmur3Benchmark.hash_yonik 1000 thrpt 5 2580.319 ± 7.761 ops/ms
Murmur3Benchmark.hash_yonik 10000 thrpt 5 268.995 ± 1.895 ops/ms
Murmur3Benchmark.hash_yonik 100000 thrpt 5 27.137 ± 0.203 ops/ms
It'd of course be worse again for SegmentedByteArray, where you add a logical shift and bitwise for every com.netflix.hollow.core.memory.ByteData#get.
That got me thinking that there are field read cases like com.netflix.hollow.core.read.engine.object.HollowObjectTypeReadStateShard#readString(com.netflix.hollow.core.memory.ByteData, long, int) that'd likely be impacted too.
So it feels like ByteData abstraction needs to be leakier and start with the a contract that assumes segmented data and provide ways of expose underlying arrays directly, or at least a byte[] contract that handles and copies across the potential spanned array segments.
Full Benchmark Results
Benchmark (length) Mode Cnt Score Error Units
Murmur3Benchmark.hash_guava 1 thrpt 5 187422.727 ± 5990.667 ops/ms
Murmur3Benchmark.hash_guava 10 thrpt 5 90013.621 ± 4670.890 ops/ms
Murmur3Benchmark.hash_guava 100 thrpt 5 15803.129 ± 254.116 ops/ms
Murmur3Benchmark.hash_guava 1000 thrpt 5 1646.876 ± 15.252 ops/ms
Murmur3Benchmark.hash_guava 10000 thrpt 5 168.375 ± 3.224 ops/ms
Murmur3Benchmark.hash_guava 100000 thrpt 5 18.480 ± 0.923 ops/ms
Murmur3Benchmark.hash_hive 1 thrpt 5 225538.310 ± 28378.051 ops/ms
Murmur3Benchmark.hash_hive 10 thrpt 5 110997.225 ± 3593.197 ops/ms
Murmur3Benchmark.hash_hive 100 thrpt 5 19188.473 ± 1607.453 ops/ms
Murmur3Benchmark.hash_hive 1000 thrpt 5 2226.580 ± 149.374 ops/ms
Murmur3Benchmark.hash_hive 10000 thrpt 5 227.843 ± 22.904 ops/ms
Murmur3Benchmark.hash_hive 100000 thrpt 5 23.052 ± 2.436 ops/ms
Murmur3Benchmark.hash_hollow 1 thrpt 5 136862.511 ± 11350.094 ops/ms
Murmur3Benchmark.hash_hollow 10 thrpt 5 81524.319 ± 2689.955 ops/ms
Murmur3Benchmark.hash_hollow 100 thrpt 5 17258.054 ± 302.410 ops/ms
Murmur3Benchmark.hash_hollow 1000 thrpt 5 1898.878 ± 54.612 ops/ms
Murmur3Benchmark.hash_hollow 10000 thrpt 5 194.730 ± 6.152 ops/ms
Murmur3Benchmark.hash_hollow 100000 thrpt 5 19.450 ± 0.537 ops/ms
Murmur3Benchmark.hash_sangupta 1 thrpt 5 167244.867 ± 8154.188 ops/ms
Murmur3Benchmark.hash_sangupta 10 thrpt 5 66672.478 ± 18962.516 ops/ms
Murmur3Benchmark.hash_sangupta 100 thrpt 5 6507.477 ± 9612.646 ops/ms
Murmur3Benchmark.hash_sangupta 1000 thrpt 5 704.660 ± 25.304 ops/ms
Murmur3Benchmark.hash_sangupta 10000 thrpt 5 81.783 ± 110.223 ops/ms
Murmur3Benchmark.hash_sangupta 100000 thrpt 5 6.828 ± 8.232 ops/ms
Murmur3Benchmark.hash_yonik 1 thrpt 5 231423.849 ± 5818.267 ops/ms
Murmur3Benchmark.hash_yonik 10 thrpt 5 113567.630 ± 341.959 ops/ms
Murmur3Benchmark.hash_yonik 100 thrpt 5 22265.668 ± 174.804 ops/ms
Murmur3Benchmark.hash_yonik 1000 thrpt 5 2580.319 ± 7.761 ops/ms
Murmur3Benchmark.hash_yonik 10000 thrpt 5 268.995 ± 1.895 ops/ms
Murmur3Benchmark.hash_yonik 100000 thrpt 5 27.137 ± 0.203 ops/ms
For the 100,000 byte case, throughput for Hollow vs Yonik is 1854MB/s vs 2586 MB/s.
Relates to https://github.com/Netflix/hollow/issues/307