Nicholas Frechette
Nicholas Frechette
Continuing the previous comment. I decided to measure on my old Pixel 7 while I'm at it. | Pixel 7 Light | 30% | 60% | 90% | | ---------...
I implemented two more variants: * Using a switch statement to unpack 4 values at a time * Using VM style tail call dispatch to unpack 4 values at a...
I revisited constant rotation unpacking. The baseline implementation (v0) from ACL 2.1 is quite simple, we unpack 16 rotations at a time and to do so we have a small...
I've added SIMD support to vec3 unpacking which provides a modest 20% speed boost. It will also facilitate unrolling loops that call it and migration to AVX.
I added BTB cache flushing to the benchmark (disabled by default) which makes it easier to see branch sensitive changes. Changes inspired by: https://blog.cloudflare.com/branch-predictor/ A new variant to unpack vec3...
A number of other minor improvements to vec3 unpacking, saving a few instructions and registers.
Some inspiration for future improvements: * Shift left/right with multiply: https://mastodon.gamedev.place/@rygorous/109799623402856305
This concludes the optimization work for ACL v2.2 Spent quite a few months on this and got some good gains and insights. More work remains but we are on the...
Some notes on splines. Great source here: https://www.youtube.com/watch?v=jvPPXbo87ds Basic continuity: - C0 continuity means that the spline positions are not disjoint - C1 continuity means that the spline positions and...