cglm
cglm copied to clipboard
avx: AVX1 support for matrix inverse
cglm already supports AVX version for mat4_mul, but mat4_inv was missing. I implemented AVX1 version of matrix inverse.
After upgraded my Macbook Pro I'll try to implement AVX2 + FMA too, but since my current CPU does not support that, I can't do that for now.
I tested mat4_inv on Ivy Bridge CPU, I got similar performance with SSE (not better), but on new CPUs the result may be different. I'll try to reduce some shuffles later to increase performance.
New functions:
- [x]
glm_mat4_scale_avx(mat4 m, float s) - [x]
glm_mat4_inv_avx(mat4 mat, mat4 dest)
These are selected automatically if -mavx is set.
I'll try to optimize SIMD-ed functions with SSE3 and SSE4 later.
Coverage remained the same at 11.487% when pulling 01b93b0409c637aae8f6bbcea7b0440ac5108d84 on simd into 07e60bd0981840ec183d6e2ed23415ba63168107 on master.
glm_mat4_scale_avx() is added to master and I'll try to re-implement the AVX vesion