Ferrite.jl Performance regression tests

Introduce performance regression benchmarks through PkgBenchmark.jl on the most important pieces of Ferrite. Resolves #94 .

Benchmark Coverage

Common Lagrange assembly loops
Available hyperrectangular mesh generators
close!(dh) for the most common Lagrange problems (DofHandler + MixedDofHandler)
Application of Dirichlet BCs with both available strategies

What is missing

CI integration: Doing this should be addressed by a follow up PR. Currently a problem is that we cannot select subsets of the benchmark suite and running the complete suite really takes some time. Integration can be achieved via BenchmarkCI.jl
Micobenchmarks for utils: We have a plethora of utility functions which are not covered yet. We should at least cover non-lookup based utilities in future benchmarks.
Quadratic elements in 3D and embedded elements in 2D and 3D due to missing capabilities of the generators and dof handlers

Oct 12 '21 01:10 termi-official

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (378f4ee) 92.97% compared to head (f7d9bdf) 92.97%.

:mega: This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files

@@           Coverage Diff           @@
##           master     #388   +/-   ##
=======================================
  Coverage   92.97%   92.97%           
=======================================
  Files          28       28           
  Lines        4270     4270           
=======================================
  Hits         3970     3970           
  Misses        300      300

Impacted Files	Coverage Δ
src/Dofs/MixedDofHandler.jl	`86.09% <100.00%> (ø)`

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

Oct 12 '21 02:10 codecov-commenter

@fredrikekre is this what you had in mind?

Oct 12 '21 15:10 termi-official

Nice. Is there an easy way to run this on two different versions or something?

Oct 12 '21 17:10 fredrikekre

It should be as simple as calling PkgBenchmark.judge with the commit you want to compare against. I don't know if this works properly across branches, but if I understand correctly this is done by https://github.com/tkf/BenchmarkCI.jl via BenchmarkCI.judge (which compares the performance of the current branch head against the master).

However be careful, the suite currently takes some time. 1h tuning and I have not run the benchmark further than tuning yet. Benchmarking close! takes quite some effort and I haven't found a way to simplify this one yet.

Oct 12 '21 17:10 termi-official

1h tuning and I have not run the benchmark further than tuning yet.

Yikes, that seems a bit much. I think this is perhaps a little bit excessive to try all different combinations of dimensions etc. They all run pretty much the same code so it seems quite unlikely that one of them will randomly be super slow.

Very slow thorough benchmarks are worse than fast not-so thorough benchmarks because slow benchmarks will never be run.

Oct 13 '21 10:10 KristofferC

Yea, would be good to reduce that I think, but perhaps nice to still have the option to run everything maybe.

Oct 13 '21 11:10 fredrikekre

I agree on most points and I think having the option to run just some benchmarks would certainly help, but this feature is not yet implemented in PkgBenchmark (see https://github.com/JuliaCI/PkgBenchmark.jl/issues/132).

The rationale behind this was to provide infrastructure for possible new dispatches on close! and benchmarks with different dof handlers/grid types/.... I could reduce the loops to just include something like "2^dim element" grid in 3d with field-dim = spatial-dim (e.g. commonly incompressible elasticity). I would leave the current loops commented out such that we can reuse them once the feature is implemented in PkgBenchmark.jl

Oct 13 '21 11:10 termi-official

Workload is now 8 minutes for tuning and 5 minutes for the actual benchmark. ~~Not sure about the output tho. The vector-Laplacian assembly is sometimes faster than the scalar Laplacian~~. Nvm.

Oct 13 '21 22:10 termi-official

Workload is now 8 minutes for tuning and 5 minutes for the actual benchmark

That feel pretty ok.

Oct 15 '21 09:10 KristofferC

I think this is one ready now. Can't fix the issue to run subsets of the test with this PR, because this is an issue on the PkgBenchmark backend interface. Will suggest a fix in this package in the future.

Oct 20 '21 13:10 termi-official

I took a look into implementing running subsets of the benchmark but cannot find a good angle on this. Problem is consistent integration with the tuning procedure. Nonetheless, I think this implementation is helpful to compare different approaches and to catch performance regressions, especially since we currently refactor a bit and add quite a bit new features. I think I can just add two additional CI jobs to cover such benchmarks (for nightly and LTS) via BenchmarkCI.jl.

Oct 18 '22 22:10 termi-official

Would be good with some docs on how to run, for example how to compare a PR with master. A make target in a benchmarks/Makefile that does this would be nice to have.

Feb 28 '23 13:02 fredrikekre

I added a makefile for the most common tasks and some docs. Also, I manually split up the benchmarks into small groups which can be executed individually for more fine-grained control (since PkgBenchmark.jl does not have such a feature).

Feb 28 '23 17:02 termi-official

Ferrite.jl Ferrite.jl copied to clipboard

Performance regression tests

Benchmark Coverage

What is missing

Codecov Report

Ferrite.jl
Ferrite.jl copied to clipboard