aqrit

Results 56 comments of aqrit

I have not done a benchmark. I have an old K8 [Athlon 64 X2 (Windsor)], if you'd like me to time some code. I saw maskmovdqu had a performace issue...

> it's a pity that ghostdog didnt include your work in a new version of his mod I never finished the work, so ghostdog couldn't include it.

memory-mapped ring buffer, perhaps.

any future in SWAR atoi()? ``` // pseudocode uint64_t v, m, len; v = unaligned_load_little_endian_u64(p); m = v + 0x4646464646464646; // roll '9' to 0x7F v -= 0x3030303030303030; // unpacked...

AVX2 can `atoi()` four strings at once, I don't know how useful this is. ``` __m256i parse_uint_x4(void* base_addr, __m256i offsets_64x4) { const __m256i x00 = _mm256_setzero_si256(); const __m256i x0A =...

Unfortunately, [broken](https://github.com/google/cpu_features/issues/4) since day 1, 2+ years ago, seems like a red flag.

> > The only downside of SIMDe project I see is the inability to do drop-in replacements as all intrinsics are are prefixed with `simde_` > > Doesn't that completely...

edit: nevermind. ``` __m128i sse2_abs_sat_s8(__m128i a) { __m128i m = _mm_cmpgt_epi8(_mm_setzero_si128(), a); a = _mm_xor_si128(a, m); return _mm_subs_epi8(a, m); } ```

``` __m128i sse2_abs_sat_s32(__m128i a) { __m128i m = _mm_srai_epi32(a, 31); a = _mm_xor_si128(a, m); a = _mm_sub_epi32(a, m); m = _mm_srai_epi32(a, 31); return _mm_xor_si128(a, m); } ```

another uninformed flyby: Suppose the negation of INT_MIN is UB? Thus the compiler assumes it never happens, and precedes to optimizes away the INT_MIN stuff after the call to `abs()`.