easyaspi314
easyaspi314
Some Doxygen documentation added in #462
1. `XXH_SIZEOPT` config option - `==0`: normal - `==1`: Disables forceinline and manual unrolling - `==2`: Reuse streaming API for single shot, other dirty size hacks? 2. *Potential* speed boost:...
> but if a "dumb" compiler, say MSVC /O2, see important speed regressions... (╯°□°)╯︵ ┻━┻ ``` C:\code\xxhash> xxhsum.exe -b1 xxhsum.exe 0.8.1 by Yann Collet compiled as 32-bit i386 + SSE2...
This is what I was thinking. It uses some of the naming styles from XXH3. ```c /*! * @internal * @brief Seeds the accumulator lanes for @ref XXH32(). * *...
I think for XXH64, we should just use a nested loop for the bulk loop, as long as MSVC *x64* unrolls it (but MSVC x64 is more liberal in unrolling...
Draft at [`easyaspi314:modern_xxh32_xxh64`](https://github.com/easyaspi314/xxHash/tree/modern_xxh32_xxh64). I will make a PR once I do some benchmarking. I also changed the mem32/mem64 fields to unsigned char arrays which shouldn't break binary ABI.
Should we remove XXH_OLD_NAMES as well?
On a side note, I was toying with a mixed NEON/scalar XXH64. On my Pixel 4a, clang and GCC get the same 2804 MB/s normally, but with half NEON and...
It only seems to affect AArch64, but XXH3 runs *incredibly* with a 6:2 ratio in #632, even (mostly) fixing the lackluster performance from GCC (30% faster, but still slower than...
Ok I'm having a little trouble following the assembly code (I'll need some time to digest it 😅), but by caching do you mean something like a ring buffer? ```c...