cglm
cglm copied to clipboard
WIP: More Optimizations and SIMD fixes for MSVC & ARM
trafficstars
- [WIP] More SIMD optimizations
- Matrix invert
- Non-Square matrices
- Transforms
- AABB
- Frustum
- simd for int types
- ...
- [x] Fix compiling on MSVC + ARM32 ( dont align types on MSVC + ARM32 due to "719: formal parameter with requested alignment of 16 won't be aligned" )
- [x] msvc, simd: fix simd headers for _M_ARM64EC
- [x] arm, neon: fix neon support on GCC ARM
- [ ] Try interleave independent instructions to take advantages of ILP if possible ( compilers may do this already but manually giving the hint is nice )
- [ ] Try reduce port pressure where possible e.g. use some _mm_blend_ps instead lot of _mm_shuffle_ps ( this step may take a time also needs to be profiled e.g Intel VTune can be used to see the bottleneck + speed test... ). Maybe on another PRs...