easyaspi314 comments

Results 132 comments of


                                            easyaspi314

Suggestions list for future evolutions

Some Doxygen documentation added in #462

Suggestions list for future evolutions

1. `XXH_SIZEOPT` config option - `==0`: normal - `==1`: Disables forceinline and manual unrolling - `==2`: Reuse streaming API for single shot, other dirty size hacks? 2. *Potential* speed boost:...

XXH32/XXH64 modernization

> but if a "dumb" compiler, say MSVC /O2, see important speed regressions... (╯°□°）╯︵ ┻━┻ ``` C:\code\xxhash> xxhsum.exe -b1 xxhsum.exe 0.8.1 by Yann Collet compiled as 32-bit i386 + SSE2...

XXH32/XXH64 modernization

This is what I was thinking. It uses some of the naming styles from XXH3. ```c /*! * @internal * @brief Seeds the accumulator lanes for @ref XXH32(). * *...

XXH32/XXH64 modernization

I think for XXH64, we should just use a nested loop for the bulk loop, as long as MSVC *x64* unrolls it (but MSVC x64 is more liberal in unrolling...

Draft at [`easyaspi314:modern_xxh32_xxh64`](https://github.com/easyaspi314/xxHash/tree/modern_xxh32_xxh64). I will make a PR once I do some benchmarking. I also changed the mem32/mem64 fields to unsigned char arrays which shouldn't break binary ABI.

XXH32/XXH64 modernization

Should we remove XXH_OLD_NAMES as well?

XXH32/XXH64 modernization

On a side note, I was toying with a mixed NEON/scalar XXH64. On my Pixel 4a, clang and GCC get the same 2804 MB/s normally, but with half NEON and...

XXH32/XXH64 modernization

It only seems to affect AArch64, but XXH3 runs *incredibly* with a 6:2 ratio in #632, even (mostly) fixing the lackluster performance from GCC (30% faster, but still slower than...

[RFC] cache secret array in assembly code for aarch64 SVE

Ok I'm having a little trouble following the assembly code (I'll need some time to digest it 😅), but by caching do you mean something like a ring buffer? ```c...