Kendall Willets

Results 7 comments of Kendall Willets

Yes, it doesn't leave the data in cache obviously. Unfortunately the non-aligned compressed data is hard to write non-temporally. Maybe it could be packed into a buffer register or two...

I had a thought related to this for the decoder which might be interesting. It adds a few steps vs. a raw lookup, but it may allow scaling to larger...

I was looking at this for UTF-8 conversion as well, but it seems easier when the control byte is already available. utf-8 needs a lot of pmovmskb/pdep/tzcnt's to get the...

The length is 1 + the rightmost index in the decoding shuffle, which is within the last four entries.

It is definitely susceptible to various bithacks. I used the permutation table since it's available, but the 2-bit fields could also be summed in a few instructions. mod15 compiles to...

@lemire @aqrit I started a [branch](https://github.com/lemire/streamvbyte/tree/prefix_lengths) to see if precomputing byte lengths 8-at-a-time and prefix summing would speed up the decode loop. The pointer arithmetic in each decode_avx seemed like...

One note about shuffles is that we can switch endianness in u32 lanes by setting index ^= 3. So if we were eg retrieving index 0 little-endian we get 3...