AVX-AVX2-Example-Code
AVX-AVX2-Example-Code copied to clipboard
Example code for Intel AVX / AVX2 intrinsics.
AVX / AVX2 Intrinsics Example Code
-
AVX / AVX2 Intrinsics Example Code
-
Quick Start
- Compile
- Run
- Clean
-
Initialization Intrinsics
- Initialization with Scalar Values
- Loading Data to Memory
-
Arithmetic Intrinsics
- Addition and Subtraction
- Multiplication and Division
- Fused Multiply and Add (FMA)
-
Permuting and Shuffling
- Permuting
- Copyright
-
Quick Start
Quick Start
Compile
$ make
All the source files in src/
will be compiled and generate binary files to the bin/
in each subdirectory.
Run
Fast compile & run at one time!
Execute this command:
$ make run
At the project root directory, then you can see all the program output on your terminal :tada:
Clean
It's easy to clean all the output files, just enter the following command at the project root directory:
$ make clean
Then you would find out all the output files are gone away!
Initialization Intrinsics
Initialization with Scalar Values
-
setzero
- _mm256_setzero_ps
- _mm256_setzero_pd
- _mm256_setzero_si256
-
set1
- _mm256_set1_ps
- _mm256_set1_pd
- _mm256_set1_epi32
- _mm256_set1_epi64x
- _mm256_set1_epi16
- _mm256_set1_epi8
-
set
- _mm256_set_ps
- _mm256_set_pd
- _mm256_set_epi32
- _mm256_set_epi64x
- _mm256_set_epi16
- _mm256_set_epi8
- _mm256_set_m128
- _mm256_set_m128d
- _mm256_set_m128i
-
setr
- _mm256_setr_ps
- _mm256_setr_pd
- _mm256_setr_epi32
- _mm256_setr_epi64x
- _mm256_setr_epi16
- _mm256_setr_epi8
- _mm256_setr_m128
- _mm256_setr_m128d
- _mm256_setr_m128i
Loading Data to Memory
-
load
- _mm256_load_ps
- _mm256_load_pd
- _mm256_load_si256
-
loadu
- _mm256_loadu_ps
- _mm256_loadu_pd
- _mm256_loadu_si256
-
maskload
- _mm_maskload_ps
- _mm_maskload_pd
- _mm256_maskload_ps
- _mm256_maskload_pd
-
_mm_maskload_epi32
AVX2
-
_mm_maskload_epi64
AVX2
-
_mm256_maskload_epi32
AVX2
-
_mm256_maskload_epi64
AVX2
Arithmetic Intrinsics
Addition and Subtraction
-
add
- _mm256_add_ps
- _mm256_add_pd
-
_mm256_add_epi8
AVX2
-
_mm256_add_epi16
AVX2
-
_mm256_add_epi32
AVX2
-
_mm256_add_epi64
AVX2
-
sub
- _mm256_sub_ps
- _mm256_sub_pd
-
_mm256_sub_epi8
AVX2
-
_mm256_sub_epi16
AVX2
-
_mm256_sub_epi32
AVX2
-
_mm256_sub_epi64
AVX2
-
adds
-
_mm256_adds_epi8
AVX2
-
_mm256_adds_epi16
AVX2
-
_mm256_adds_epu8
AVX2
-
_mm256_adds_epu16
AVX2
-
_mm256_adds_epi8
-
subs
-
_mm256_subs_epi8
AVX2
-
_mm256_subs_epi16
AVX2
-
_mm256_subs_epu8
AVX2
-
_mm256_subs_epu16
AVX2
-
_mm256_subs_epi8
-
hadd
- _mm256_hadd_ps
- _mm256_hadd_pd
-
_mm256_hadd_epi16
AVX2
-
_mm256_hadd_epi32
AVX2
-
hsub
- _mm256_hadd_ps
- _mm256_hadd_pd
-
_mm256_hadd_epi16
AVX2
-
_mm256_hadd_epi32
AVX2
-
hadds
-
_mm256_hadds_epi16
AVX2
-
_mm256_hadds_epi16
-
hsubs
-
_mm256_hsubs_epi16
AVX2
-
_mm256_hsubs_epi16
-
addsub
- _mm256_addsub_ps
- _mm256_addsub_pd
Multiplication and Division
-
mul
- _mm256_mul_ps
- _mm256_mul_pd
-
_mm256_mul_epi32
AVX2
-
_mm256_mul_epu32
AVX2
-
mullo
-
_mm256_mullo_epi16
AVX2
-
_mm256_mullo_epi32
AVX2
-
_mm256_mullo_epi16
-
mulhi
-
_mm256_mulhi_epi16
AVX2
-
_mm256_mulhi_epu16
AVX2
-
_mm256_mulhi_epi16
-
mulhrs
-
_mm256_mulhrs_epi16
AVX2
-
_mm256_mulhrs_epi16
-
div
- _mm256_div_ps
- _mm256_div_pd
Fused Multiply and Add (FMA)
-
fmadd
-
_mm_fmadd_ps
FMA
-
_mm_fmadd_pd
FMA
-
_mm256_fmadd_ps
FMA
-
_mm256_fmadd_pd
FMA
-
_mm_fmadd_ss
FMA
-
_mm_fmadd_sd
FMA
-
_mm_fmadd_ps
-
fmsub
-
_mm_fmsub_ps
FMA
-
_mm_fmsub_pd
FMA
-
_mm256_fmsub_ps
FMA
-
_mm256_fmsub_pd
FMA
-
_mm_fmsub_ss
FMA
-
_mm_fmsub_sd
FMA
-
_mm_fmsub_ps
-
fnmadd
-
_mm_fnmadd_ps
FMA
-
_mm_fnmadd_pd
FMA
-
_mm256_fnmadd_ps
FMA
-
_mm256_fnmadd_pd
FMA
-
_mm_fnmadd_ss
FMA
-
_mm_fnmadd_sd
FMA
-
_mm_fnmadd_ps
-
fnmsub
-
_mm_fnmsub_ps
FMA
-
_mm_fnmsub_pd
FMA
-
_mm256_fnmsub_ps
FMA
-
_mm256_fnmsub_pd
FMA
-
_mm_fnmsub_ss
FMA
-
_mm_fnmsub_sd
FMA
-
_mm_fnmsub_ps
-
fmaddsub
-
_mm_fmaddsub_ps
FMA
-
_mm_fmaddsub_pd
FMA
-
_mm256_fmaddsub_ps
FMA
-
_mm256_fmaddsub_pd
FMA
-
_mm_fmaddsub_ps
-
fmsubadd
-
_mm_fmsubadd_ps
FMA
-
_mm_fmsubadd_pd
FMA
-
_mm256_fmsubadd_ps
FMA
-
_mm256_fmsubadd_pd
FMA
-
_mm_fmsubadd_ps
Permuting and Shuffling
Permuting
-
permute
- _mm_permute_ps
- _mm_permute_pd
- _mm256_permute_ps
- _mm256_permute_pd
-
permute4x64
-
_mm256_permute4x64_pd
AVX2
-
_mm256_permute4x64_epi64
AVX2
-
_mm256_permute4x64_pd
-
permute2f128
- _mm256_permute2f128_ps
- _mm256_permute2f128_pd
- _mm256_permute2f128_si256
-
permutevar
- _mm_permutevar_ps
- _mm_permutevar_pd
- _mm256_permutevar_ps
- _mm256_permutevar_pd
-
permutevar8x32
-
_mm256_permutevar8x32_ps
AVX2
-
_mm256_permutevar8x32_epi32
AVX2
-
_mm256_permutevar8x32_ps
-
shuffle
- _mm256_shuffle_ps
- _mm256_shuffle_pd
- _mm256_shuffle_epi32
- _mm256_shuffle_epi8
-
shufflehi
-
_mm256_shufflehi_epi16
AVX2
-
_mm256_shufflehi_epi16
-
shufflelo
-
_mm256_shufflelo_epi16
AVX2
-
_mm256_shufflelo_epi16
Copyright
This project is licensed under the BSD 3-Clause license.