Jamie Brandon comments

Results 42 comments of


                                            Jamie Brandon

Optimize hashing and merging

IPC doesn't seem terrible: ``` 175,850,971,567 cpu_core/cycles/ (35.11%) 309,221,910,919 cpu_core/instructions/ (42.03%) 3,567,518,372 cpu_core/cache-references/ (48.93%) 1,930,730,082 cpu_core/cache-misses/ (55.69%) 1,804,591,591 cpu_core/L1-dcache-load-misses/ (62.61%) 61,552,296,620 cpu_core/L1-dcache-loads/ (62.46%) 28,322,845,875 cpu_core/L1-dcache-stores/ (62.46%) 324,262,031 cpu_core/L1-icache-load-misses/ (55.56%) cpu_core/L1-icache-loads/...

Optimize hashing and merging

Here are some perf samples - https://gist.github.com/jamii/c6cbde8b172380ba974ccd02933da3e5 Big offenders for various misses are: * various hashmap functions * tree.lookup_from_memory/table_immutable.get (binary search) * memset (zeroing blocks in grid) * sort *...

Optimize hashing and merging

We spend 1/3rd of cpu time in blake3. Building with release-fast gives ~10% throughput improvement. Building with -Dcpu=alderlake gives none. I checked that the output of vsr.checksum is using the...

Optimize hashing and merging

https://github.com/ziglang/zig/blob/6d44a6222d6eba600deb7f16c124bfa30628fb60//lib/std/crypto/benchmark.zig#L404 reports 90 mb/s for blake3. https://raw.githubusercontent.com/BLAKE3-team/BLAKE3-specs/master/blake3.pdf reports 0.5 cycles/byte for single-thread blake3 on an older cpu. That would work about to about 6-8gb/s on my cpu, which roughly matches...

Jamie Brandon

Optimize hashing and merging

Optimize hashing and merging

Optimize hashing and merging

Optimize hashing and merging

Optimize hashing and merging

Optimize hashing and merging

Optimize hashing and merging

Sorting table_mutable starves IO

Compaction reads seem to entirely miss the grid cache.

Compaction reads seem to entirely miss the grid cache.