libvisual
libvisual copied to clipboard
Core (LV::Video) Fix alpha blending of 32-bit videos (#230)
This is a rewrite of the buggy 32-bit LV::Video alpha blending code to deal with arithmetic underflows/overflows and the use of an uninitialized register in the MMX implementation (#230).
Take note that GCC/Clang x86-64 (recent only?) produces SSE instructions instead of MMX. GCC 12.2 uses the XMM registers while Clang sticks to MM but throwing in the use of pshuflw.
Perhaps it's time to move on and use SSE2 (introduced in 2000 to Pentium 4s) to work with 2 pixels at once. Or maybe even 4 pixels at once. This will require larger memory alignments and complicate the code a bit more to work with non-divisible row widths.
Here is a link to the plain C and SIMD code in Godbolt.
@hartwork, any chance you could look at this again?
@hartwork, any chance you could look at this again?
@kaixiong I hope to find time to, in the coming days