Nicholas Frechette
Nicholas Frechette
This is also called symmetric rounding. * 0.5 -> 1.0 * -0.5 -> -1.0
Shuffles are very common in realtime 3d applications. From regular 3x3, 3x4, and 4x4 matrix math to quaternion math too. Over a whole application it might not contribute all that...
My experience runs contrary to yours, @zeux . In my tests, FMA instructions are always slower under x64. I did notice fast-math automatically generating them and it is one of...
FMA is definitely not free on Ryzen. The picture isn't clear cut. For example, Ryzen executes up to 5 instructions per cycle. addps takes 1 op, has a latency of...
Even if the instructions dispatch in the same cycle with and/andnot/or, they will not retire in the same cycle due to the dependencies. Depending on the surrounding code, it's possible...
Shifts are commonly used with fixed point arithmetic but it isn't as common on 8 bit values (16/32/64 bit being the most common). I also imagine that it might be...
I agree with @zeux that loading constants with fewer instructions is generally the way to go. It improves the code density and by reducing the number of instructions and registers...
Great stuff! I will definitely keep it open in a tab as a reference when porting my code. It is worth noting that on ARM v7 and ARM 64, it...
@dtig As the discussion around shuffles has shown, some mappings will have poor or expected performance on some platforms and this isn't easily avoided without more platform specific intrinsics (which...
Cody Jones has began this work [here](https://github.com/CodyDWJones/acl)