minimp3
minimp3 copied to clipboard

Published 20 hours ago •

Reame
Issues

Replace 8PEXTRW with 1MOVDQU in f32_to_s16

Open WolfWings opened this issue 7 months ago • 1 comments

The existing code has a series of 8 sequential unrolled PEXTRW, which compilers generally cannot detect and optimize to a single MOVDQU instruction.

As such manually placing the optimized unaligned store intrinsic in place is an enormous performance win for SSE with identical output.

Jul 25 '24 03:07 WolfWings