aqrit

Results 80 comments of aqrit

My thoughts on parsing integers start here: https://stackoverflow.com/a/74453028 I don't know if it actually makes anything faster.

Base16 (hex) can be decoded similarly to base32hex, except first we’d `v = _mm_add_epi8(v, _mm_set1_epi8(-1))` to make the spans line up. We could also get rid of both the multiply...

> multiplication vs shift I was incorrectly assuming a bswap would be required. So it would have been `shift -> or -> shuffle` vs `pmaddubsw -> shuffle`. However, using `pack`...

Note: zero_masks table is probably not need here, because output bytes are formed from only two input bytes. At worst, bad chars could be zero'd out using `blend` ?

Type punning via union is illegal in C++. C++20 added `std::bit_cast`. Without `bit_cast` one could use `memcpy` for this, however that is still somewhat undefined? Type punning via union is...

AFAIK, `memcpy` is the only "safe" way to do this in C++ before C++20. ``` #ifdef __cplusplus # if __cplusplus >= 202002L // use bitcast # else // use memcpy...

`MSVC` and `ICC` define `_MM_HINT_T0` as `1`. `GCC`, `CLANG`, and `ICX` define it as `3`. I will confirm that `SIMDE_MM_HINT_T0` produces the wrong prefetch (`PREFETCHT2`) under gcc. But the correct...

> Maybe, this calls for a delicate compiler filtering maybe, something like: ``` #if defined(SIMDE_X86_SSE_NATIVE) && defined(_MM_HINT_T0) # define SIMDE_MM_HINT_T0 _MM_HINT_T0 #else # define SIMDE_MM_HINT_T0 3 #endif ``` Might get...

`_mm256_slli_epi16` also "needs a compile stage constant for the 2nd parameter"... related: https://github.com/simd-everywhere/simde/issues/905#issuecomment-1286352881

Note: it works if `value` is `const` instead of `var`. Also the same behavior occurs when bitcasting to a vector of `u1` instead of `bool`