minimp3 icon indicating copy to clipboard operation
minimp3 copied to clipboard

Replace 8*PEXTRW with 1*MOVDQU in f32_to_s16

Open WolfWings opened this issue 7 months ago • 1 comments

The existing code has a series of 8 sequential unrolled PEXTRW, which compilers generally cannot detect and optimize to a single MOVDQU instruction.

As such manually placing the optimized unaligned store intrinsic in place is an enormous performance win for SSE with identical output.

WolfWings avatar Jul 25 '24 03:07 WolfWings