nalgebra
nalgebra copied to clipboard
Improve performance in debug mode
This issue originates from a request made by @happenslol there.
Much effort has already be made to improve performance of nalgebra when compiled with full optimization (O3 and link-time optimization enabled). Now we should investigate how to improve performances in debug mode (or with low optimization levels). Here are two possible leads:
- Lack of explicit SIMD. Adding explicit SIMD to nalgebra could yield significant performance improvements.
- Non-zero cost abstractions in debug mode. Unfortunately, abstractions are zero-cost only when optimization are enable. We should investigate in particular the occurrence and cost of:
- Nested function calls.
- Indexing (unnecessary bound checking for statically-sized matrices).
- Debug assertions?
- Loops that are not unrolled.
- Trivial checks and branching that are not removed by the compiler (for example checks like that are automatically evaluated at compile-time and removed by the optimizing compiler for statically-sized vectors).
I am not sure yet if we can make significant improvements using the stable version of the compiler. However, we could benefit from specialization and explicit SIMD using the nigthly compiler.
Another thing I'd add to that list is the pattern of passing arguments by reference instead of value. Particularly for small structs like UnitComplex or Point3 that are only a few floats, forcing them out of registers into memory can be a drag on performance. Obviously changing that would be a breaking change, but it's worth keeping in mind.
Is there a place where you're tracking the progress of performance improvements? Also, would it make sense to make a new "performance" label for github issues?
@Gonkalbell No, this isn't tracked anywhere. I would love to be able to track performances as part of the CI builds. Anybody knows a tool for automating this?
Beware that perf measurements on CI can be extremely noisy due to shared hardware.
It might be worth teaching people in docs to use this in their Cargo.toml when depending on nalgebra:
[profile.dev.package."*"]
opt-level = 3
In my experience, this significantly improves performance in some cases (not nalgebra specific but e.g. loading a 3D model goes from 1.4 s to 0.2 s) without meaningfully impacting incremental build times since deps only change rarely.
(I originally learned this trick from macroquad which recommends it right in the readme.)