memilio
memilio copied to clipboard
Performance of Eigen code and compiler flags
Motivation / Current Behaviour
Memilio uses Eigen with expression templates, and some function explicitly return Eigen expressions using auto as return type.
This code easily gets very complex for the compiler optimization.
Enhancement description
In another project, we investigated the effect of several compiler flags concerning optimizations and in particular concerning inlining and automatic memory allocations that occur in Eigen expressions. We speed up the calculation in this other project by about a factor of 10 through this (at least, we optimized this in several steps, so I don't know the total factor from before/after).
Before applying any of the suggestions below, we should measure the runtime with/without these for relevant use cases (the code in the other project might behave differently).
Optimizations that may have a major effect:
- Avoid memory allocation through using
noalias(), see https://eigen.tuxfamily.org/dox/group__TopicAliasing.html :matB.noalias() = matA * matAdoes not allocate memory,matB = matA * matAcreates a temporary matrix for the result. - (Avoid memory allocations through returning Eigen data types: seems already optimized at first sight)
- Avoid memory allocations for temporary arrays: in the other project we use
static thread_local Eigen::MatrixXd xyz; xyz.resize(n,m); xyz = ...for temporary arrays. Only relevant when done repeatedly in a loop and the amount without doing lots of computations. Warning: not sure if relevant but broken on Windows MinGW+POSIX-Threads (compiler bug concerning multi-threading). - Compiler flags for better inlining, see below.
- Compiler flags for the target CPU and floating-point optimizations to enable e.g., SIMD, see below.
Additional context
Compiler flags for better inlining:
In the other project, we have a CMake option to enable the following GCC flags at the cost of higher compilation time:
--param inline-unit-growth=200 --param large-function-growth=1000 --param early-inlining-insns=2000 --param max-inline-insns-single=200 --param max-early-inliner-iterations=50 --param max-gcse-memory=13107200 --param large-function-growth=1000 --param large-function-insns=27000 --param large-stack-frame=1024 --param max-hoist-depth=300 --param max-vartrack-size=0
Without this, the compiler stopped inlining functions containing Eigen code way too early...
In Debug mode, we also add --param early-inlining-insns=200 --param max-early-inliner-iterations=10 --param max-vartrack-size=0 in CI runs for faster results with valgrind (if that's relevant).
Compiler flags for target CPU and floating-point optimizations:
Again, in the other project, we have an option to enable the following flags: -fno-math-errno -funsafe-math-optimizations
This is not supported by the Eigen SVD module and will produce incorrect SVDs - therefore, one has to include different parts of Eigen with different compiler flags like this:
// include the content of Eigen/Dense individually, so we can fix the optimization options for Eigen/SVD (which does not work with unsafe-math-optimizations)
//#include <Eigen/Dense>
#include <Eigen/Core>
#include <Eigen/LU>
#include <Eigen/Cholesky>
#include <Eigen/QR>
// use Eigen/SVD but avoid unsupported compiler optimizations...
#if defined(__INTEL_COMPILER) || defined(__clang__)
# pragma float_control(precise, on, push)
#else
# pragma GCC push_options
# pragma GCC optimize("no-unsafe-math-optimizations")
#endif
#include <Eigen/SVD>
#ifdef __INTEL_COMPILER
# pragma float_control(pop)
#else
# pragma GCC pop_options
#endif
#include <Eigen/Geometry>
#include <Eigen/Eigenvalues>
Checklist
- [ ] Attached labels, especially loc:: or model:: labels.
- [ ] Linked to project
Hi @reneSchm and perhaps @dabele
I had a discussion with @mknaranja on the performance of some part of the code and thought these general steps from a helicopter code in our group could be helpful here.
Probably rather low priority (I cannot add labels here, I suppose).