cgmath
cgmath copied to clipboard
Matrix3 multiplication is 4x slower in 0.18
AMD Ryzen 7 3700X Windows 10 19044.1566 rustc 1.60.0-nightly (1e12aef3f 2022-02-13)
The other day I was testing sprite performance in my engine and noticed it was a lot lower than usual. Investigated the cause and found it was when I updated cgmath from 0.17 to 0.18.
With 0.17, my benchmark that does 100k translations, rotations, and scales on a Matrix3:
test bench_transform_matrix ... bench: 670,110 ns/iter (+/- 13,076)
With 0.18:
test bench_transform_matrix ... bench: 2,755,590 ns/iter (+/- 10,904)
The bench itself Code for the transform wrapper
Since the transform happens for each sprite, the performance difference adds up quickly.
Using default features with -O2/-O3. Compiler target is x86_64-pc-windows-msvc
.
Wow that's quite concerning! Thank you for bringing this up.
comparing the benchmarks provided by the crate shows a similar (3x) increase:
cargo bench --feature rand _bench_matrix3_mul_m (0.18.0)
>> test _bench_matrix3_mul_m ... bench: 13 ns/iter (+/- 0)
cargo bench _bench_matrix3_mul_m (0.17.0)
>> test _bench_matrix3_mul_m ... bench: 4 ns/iter (+/- 0)
There was no significant increase for any of the other benchmarks regarding Matrix3
It seems the problem is already fixed on the master branch, getting 4ns results there.
Found the issue:
https://github.com/rustgd/cgmath/blob/637c566cc2141203d8d99c03e7ab770796c44f5f/src/vector.rs#L311-L337
This is on v0.18.0
. The addition of default_fn!
prevents #[inline]
from working, which is causing the slowdown. This was fixed in #548 by moving the #[inline]
into the macro.
Great, thank you for investigation!