lehrfempp
lehrfempp copied to clipboard
Profiling
It would be important to profile the example implementing a linear finite element solver for a full-featured elliptic boundary value problem (examples/ell_bvp_linfe) in order to identify performance bottlenecks in LehrFEM++. This example is currently in the lagr_fe_demo branch, but will be merged into master soon.
I've profiled ell_bvp_linfe as you've suggested on Windows and on my laptop. The problem is a bit that it is not so easy to share the result with you in an easy way. I've just extracted the function calls made from main() in the following excel file: https://www.dropbox.com/s/b4ifubddsf3i1el/Report20190307-2341_CallTreeSummary.xlsx?dl=0
As you can see about
- 29.89% of the time is spent in generating the mesh hierarchies
- 17.72% is spent for assembling the matrices
- 14.53% is spent for solving the linear systems
- 10% is spent to compute the error to the exact solution (H1 seminorm)
- 8.57% is spent to construct the FESpaceLagrangeO1, I think this is mostly about assigning dofs to entities.
- 3.85% is spent for computing the error to the exact solution (L2 norm)
Taking a look from the bottom up, i.e. looking at in which function most time is spent exclusively, i.e. excluding calls to child functions, we get the following: https://www.dropbox.com/s/qijwj9y3907725o/Report20190307-2341_FunctionSummary.xlsx?dl=0
Here we can see that
- 15% of the time is spent in RTDynamicCast, this is the implementation of
dynamic_cast. I assume that most of this is the overhead introduced byForwardIterator/RandomAccessIterator - 10% is spent in RtlpLowFragHeapAllocFromContext which is heap allocation. Further analysis shows that about 3% out of the 10% percent of these allocations are overhead related to
ForwardIterator/RandomAccessIterator - 9.18% is spent in RtlFreeHeap which is used to free the heap.
Thanks a lot for these figures.
- Of course, refinement is expensive, because it also accommodates local refinement. This is acceptable, because the overall complexity refining a single mesh is still O(N), N the number of cells of the mesh.
- in the medium run the iterator issue should be resolved: "ranges based on pointer arrays". After the end of the term.
- I am surprised how efficient the linear solver is!