intel-intrinsics icon indicating copy to clipboard operation
intel-intrinsics copied to clipboard

#BONUS intrinsics that might be useful

Open p0nce opened this issue 3 years ago • 2 comments

Add one here every time you wish for one:

  • [ ] _mm_cvtpd_epi64 that would convert 2x double using MXCSR would speed up things for arm and non-avx x86 => actually a AVX512DQ + AVX512VL existing instruction
  • [x] _mm_abs_ps
  • [x] _mm_movemask_epi16
  • [ ] _mm_cmpge_epi8
  • [x] _mm_cmpge_epi16 (twice)
  • [ ] _mm_cmple_epi8
  • [x] _mm_cmple_epi16
  • [x] _mm_not_si128

Ideas from Alfred Klomp

  • [ ] mm_absdiff_epu16
  • [ ] mm_absdiff_epu8
  • [ ] mm_blendv_si128
  • [ ] mm_bswap_epi16
  • [ ] mm_bswap_epi32
  • [ ] mm_bswap_epi64
  • [ ] mm_bswap_si128
  • [ ] mm_cmpge_epu16
  • [ ] mm_cmpge_epu8
  • [ ] mm_cmpgt_epu16
  • [ ] mm_cmpgt_epu8
  • [ ] mm_cmple_epu16
  • [ ] mm_cmple_epu8
  • [ ] mm_cmplt_epu16
  • [ ] mm_cmplt_epu8
  • [ ] mm_div255_epu16
  • [ ] mm_div_epu8
  • [ ] mm_divfast_epu16
  • [ ] mm_divfast_epu8
  • [x] mm_max_epu16
  • [x] mm_min_epu16
  • [x] mm_not_si128
  • [ ] mm_scale_epu8
  • [ ] _mm256_unpacklo_si128
  • [ ] _mm256_unpackhi_si128

p0nce avatar Oct 18 '21 22:10 p0nce

http://www.alfredklomp.com/programming/sse-intrinsics/

p0nce avatar Jan 24 '22 16:01 p0nce

complex multiply, complex add, complex sub, complex divide

p0nce avatar Oct 02 '22 19:10 p0nce