AVX-AVX2-Example-Code icon indicating copy to clipboard operation
AVX-AVX2-Example-Code copied to clipboard

Example code for Intel AVX / AVX2 intrinsics.

AVX / AVX2 Intrinsics Example Code

  • AVX / AVX2 Intrinsics Example Code
    • Quick Start
      • Compile
      • Run
      • Clean
    • Initialization Intrinsics
      • Initialization with Scalar Values
      • Loading Data to Memory
    • Arithmetic Intrinsics
      • Addition and Subtraction
      • Multiplication and Division
      • Fused Multiply and Add (FMA)
    • Permuting and Shuffling
      • Permuting
    • Copyright

Quick Start

Compile

$ make

All the source files in src/ will be compiled and generate binary files to the bin/ in each subdirectory.

Run

Fast compile & run at one time!

Execute this command:

$ make run

At the project root directory, then you can see all the program output on your terminal :tada:

Clean

It's easy to clean all the output files, just enter the following command at the project root directory:

$ make clean

Then you would find out all the output files are gone away!

Initialization Intrinsics

Initialization with Scalar Values

  • setzero

    • _mm256_setzero_ps
    • _mm256_setzero_pd
    • _mm256_setzero_si256
  • set1

    • _mm256_set1_ps
    • _mm256_set1_pd
    • _mm256_set1_epi32
    • _mm256_set1_epi64x
    • _mm256_set1_epi16
    • _mm256_set1_epi8
  • set

    • _mm256_set_ps
    • _mm256_set_pd
    • _mm256_set_epi32
    • _mm256_set_epi64x
    • _mm256_set_epi16
    • _mm256_set_epi8
    • _mm256_set_m128
    • _mm256_set_m128d
    • _mm256_set_m128i
  • setr

    • _mm256_setr_ps
    • _mm256_setr_pd
    • _mm256_setr_epi32
    • _mm256_setr_epi64x
    • _mm256_setr_epi16
    • _mm256_setr_epi8
    • _mm256_setr_m128
    • _mm256_setr_m128d
    • _mm256_setr_m128i

Loading Data to Memory

  • load

    • _mm256_load_ps
    • _mm256_load_pd
    • _mm256_load_si256
  • loadu

    • _mm256_loadu_ps
    • _mm256_loadu_pd
    • _mm256_loadu_si256
  • maskload

    • _mm_maskload_ps
    • _mm_maskload_pd
    • _mm256_maskload_ps
    • _mm256_maskload_pd
    • _mm_maskload_epi32 AVX2
    • _mm_maskload_epi64 AVX2
    • _mm256_maskload_epi32 AVX2
    • _mm256_maskload_epi64 AVX2

Arithmetic Intrinsics

Addition and Subtraction

  • add

    • _mm256_add_ps
    • _mm256_add_pd
    • _mm256_add_epi8 AVX2
    • _mm256_add_epi16 AVX2
    • _mm256_add_epi32 AVX2
    • _mm256_add_epi64 AVX2
  • sub

    • _mm256_sub_ps
    • _mm256_sub_pd
    • _mm256_sub_epi8 AVX2
    • _mm256_sub_epi16 AVX2
    • _mm256_sub_epi32 AVX2
    • _mm256_sub_epi64 AVX2
  • adds

    • _mm256_adds_epi8 AVX2
    • _mm256_adds_epi16 AVX2
    • _mm256_adds_epu8 AVX2
    • _mm256_adds_epu16 AVX2
  • subs

    • _mm256_subs_epi8 AVX2
    • _mm256_subs_epi16 AVX2
    • _mm256_subs_epu8 AVX2
    • _mm256_subs_epu16 AVX2
  • hadd

    • _mm256_hadd_ps
    • _mm256_hadd_pd
    • _mm256_hadd_epi16 AVX2
    • _mm256_hadd_epi32 AVX2
  • hsub

    • _mm256_hadd_ps
    • _mm256_hadd_pd
    • _mm256_hadd_epi16 AVX2
    • _mm256_hadd_epi32 AVX2
  • hadds

    • _mm256_hadds_epi16 AVX2
  • hsubs

    • _mm256_hsubs_epi16 AVX2
  • addsub

    • _mm256_addsub_ps
    • _mm256_addsub_pd

Multiplication and Division

  • mul

    • _mm256_mul_ps
    • _mm256_mul_pd
    • _mm256_mul_epi32 AVX2
    • _mm256_mul_epu32 AVX2
  • mullo

    • _mm256_mullo_epi16 AVX2
    • _mm256_mullo_epi32 AVX2
  • mulhi

    • _mm256_mulhi_epi16 AVX2
    • _mm256_mulhi_epu16 AVX2
  • mulhrs

    • _mm256_mulhrs_epi16 AVX2
  • div

    • _mm256_div_ps
    • _mm256_div_pd

Fused Multiply and Add (FMA)

  • fmadd

    • _mm_fmadd_ps FMA
    • _mm_fmadd_pd FMA
    • _mm256_fmadd_ps FMA
    • _mm256_fmadd_pd FMA
    • _mm_fmadd_ss FMA
    • _mm_fmadd_sd FMA
  • fmsub

    • _mm_fmsub_ps FMA
    • _mm_fmsub_pd FMA
    • _mm256_fmsub_ps FMA
    • _mm256_fmsub_pd FMA
    • _mm_fmsub_ss FMA
    • _mm_fmsub_sd FMA
  • fnmadd

    • _mm_fnmadd_ps FMA
    • _mm_fnmadd_pd FMA
    • _mm256_fnmadd_ps FMA
    • _mm256_fnmadd_pd FMA
    • _mm_fnmadd_ss FMA
    • _mm_fnmadd_sd FMA
  • fnmsub

    • _mm_fnmsub_ps FMA
    • _mm_fnmsub_pd FMA
    • _mm256_fnmsub_ps FMA
    • _mm256_fnmsub_pd FMA
    • _mm_fnmsub_ss FMA
    • _mm_fnmsub_sd FMA
  • fmaddsub

    • _mm_fmaddsub_ps FMA
    • _mm_fmaddsub_pd FMA
    • _mm256_fmaddsub_ps FMA
    • _mm256_fmaddsub_pd FMA
  • fmsubadd

    • _mm_fmsubadd_ps FMA
    • _mm_fmsubadd_pd FMA
    • _mm256_fmsubadd_ps FMA
    • _mm256_fmsubadd_pd FMA

Permuting and Shuffling

Permuting

  • permute

    • _mm_permute_ps
    • _mm_permute_pd
    • _mm256_permute_ps
    • _mm256_permute_pd
  • permute4x64

    • _mm256_permute4x64_pd AVX2
    • _mm256_permute4x64_epi64 AVX2
  • permute2f128

    • _mm256_permute2f128_ps
    • _mm256_permute2f128_pd
    • _mm256_permute2f128_si256
  • permutevar

    • _mm_permutevar_ps
    • _mm_permutevar_pd
    • _mm256_permutevar_ps
    • _mm256_permutevar_pd
  • permutevar8x32

    • _mm256_permutevar8x32_ps AVX2
    • _mm256_permutevar8x32_epi32 AVX2
  • shuffle

    • _mm256_shuffle_ps
    • _mm256_shuffle_pd
    • _mm256_shuffle_epi32
    • _mm256_shuffle_epi8
  • shufflehi

    • _mm256_shufflehi_epi16 AVX2
  • shufflelo

    • _mm256_shufflelo_epi16 AVX2

Copyright

This project is licensed under the BSD 3-Clause license.