Casey Muratori
Casey Muratori
Modern Meow Hash (v0.4 and up) is actually faster than Mum on small inputs now in randomized testing. The version checked into this tree is a very old version and...
Probably copy pasta from u64, but it should not have the "unsigned". \- Casey
Because we had to change the vpxor to a vxorps to ensure AVX-1 compatibility, we probably want to go ahead and use PS everywhere now, because we don't want any...
Right now there is a really hacky thing called StartGateCounter that is used to attempt to give threads a chance to receive their instructions and start somewhat in lockstep where...
At the moment, I haven't looked at the read/write ASM so I want to remember to look to see if the xor's use a memory op to ensure 2-load 2-store...
At the moment, blandwidth doesn't probe for NUMA patterns on the CPU (or across CPUs). It would not be a particularly difficult thing to do, it would just be a...
The code in blandwidth is designed to make it easier for someone to make a linux_blandwidth.c file that would allow blandwidth to be build on Linux. However, I am far...
Although in theory the 128-bit routines in Blandwidth are written to ensure they can use 2 read and 2 write ports on every cycle where available, and CLANG produces ASM...
To allow for hashing files larger than available memory, and to prevent possible incompatibilities with ftell on 32-bit builds, I'd like to stop reading files into memory wholesale and start...
1) We do not know why CLANG generates subtract-from-pointer loads instead of add-to-pointer loads. It _says_ it's faster when _it_ generates the code inside meow_bench, but when we extract the...