intel-intrinsics
intel-intrinsics copied to clipboard
#BONUS intrinsics that might be useful
Add one here every time you wish for one:
- [ ]
_mm_cvtpd_epi64
that would convert 2x double using MXCSR would speed up things for arm and non-avx x86 => actually a AVX512DQ + AVX512VL existing instruction - [x]
_mm_abs_ps
- [x]
_mm_movemask_epi16
- [ ] _mm_cmpge_epi8
- [x] _mm_cmpge_epi16 (twice)
- [ ] _mm_cmple_epi8
- [x] _mm_cmple_epi16
- [x] _mm_not_si128
Ideas from Alfred Klomp
- [ ] mm_absdiff_epu16
- [ ] mm_absdiff_epu8
- [ ] mm_blendv_si128
- [ ] mm_bswap_epi16
- [ ] mm_bswap_epi32
- [ ] mm_bswap_epi64
- [ ] mm_bswap_si128
- [ ] mm_cmpge_epu16
- [ ] mm_cmpge_epu8
- [ ] mm_cmpgt_epu16
- [ ] mm_cmpgt_epu8
- [ ] mm_cmple_epu16
- [ ] mm_cmple_epu8
- [ ] mm_cmplt_epu16
- [ ] mm_cmplt_epu8
- [ ] mm_div255_epu16
- [ ] mm_div_epu8
- [ ] mm_divfast_epu16
- [ ] mm_divfast_epu8
- [x] mm_max_epu16
- [x] mm_min_epu16
- [x] mm_not_si128
- [ ] mm_scale_epu8
- [ ] _mm256_unpacklo_si128
- [ ] _mm256_unpackhi_si128
http://www.alfredklomp.com/programming/sse-intrinsics/
complex multiply, complex add, complex sub, complex divide