Ferrite.jl
Ferrite.jl copied to clipboard
Implement a proper sparsity pattern
This PR introduces struct SparsityPattern
for a way to work with the pattern directly instead of producing a matrix directly.
Some advantages of this are:
- Make it possible to add custom entries into the pattern (e.g. insert the necessary entries for a lagrange multiplier)
- Size is no longer limited to
ndofs(dh) x ndofs(dh)
so it is easy to insert new dofs - Simpler to support new matrix types. Before this patch everything in the construction was tightly coupled to
SparseMatrixCSC
(e.g. how element connections and constraint condensation were implemented). With this patch everything is handled on theSparsityPattern
level independent of what type of matrix will be instantiated in the end. All that is needed to support a new matrix type is now to write aMatrixType(::SparsityPattern)
constructor. - Enables instantiation of multiple matrices based on the same pattern (e.g. stiffness and mass matrices).
- Faster and lower memory usage (benchmarks incoming)
Supersedes #596 ?
Codecov Report
Attention: Patch coverage is 97.87798%
with 8 lines
in your changes missing coverage. Please review.
Project coverage is 93.69%. Comparing base (
355dfc8
) to head (aff9e54
).
Files | Patch % | Lines |
---|---|---|
src/Dofs/sparsity_pattern.jl | 97.38% | 5 Missing :warning: |
src/PoolAllocator.jl | 97.70% | 3 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## master #888 +/- ##
==========================================
- Coverage 93.79% 93.69% -0.11%
==========================================
Files 36 38 +2
Lines 5499 5723 +224
==========================================
+ Hits 5158 5362 +204
- Misses 341 361 +20
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Before cleaning up and finishing this PR there are some things I would like input on:
-
create_sparsity_pattern(dh::DofHandler, ...)
still exist but now return a sparsity pattern (and not the matrix). If you don't update your code there will be an error when you trystart_assemble
which we can intercept and print a nice message (you might run into aMethodError
earlier though). I think this is an OK change. - The new function
create_matrix
can be used as a direct replacement tocreate_sparsity_pattern
so when you don't care about modifying the pattern and just want aSparseMatrixCSC
you can usecreate_matrix(dh)
. -
create_matrix
can also now be used to pass the type of sparse matrix you want, e.g.create_matrix(SparseMatrixCSC{Float32, Int}, sparsity_pattern)
. Many matrix types are not so nice to write like this though (e.g.SparseMatrixCSR
have a first type parameter which corresponds to the indexing....) and for a block matrix you need something likeBlockMatrix{Float64, Matrix{SparseMatrixCSC{Float64, Int}}
to fully specify the type. Of course we can makecreate_matrix(BlockMatrix, sparsity_pattern)
work and fill in the defaults, but if you don't want the defaults it will be pretty clunky. One idea is to provide aliases in Ferrite, e.g.Ferrite.MatrixTypes.CSRMatrix{Tv, Ti}
. Thoughts?
Before cleaning up and finishing this PR there are some things I would like input on:
1. `create_sparsity_pattern(dh::DofHandler, ...)` still exist but now return a sparsity pattern (and not the matrix). If you don't update your code there will be an error when you try `start_assemble` which we can intercept and print a nice message (you might run into a `MethodError` earlier though). I think this is an OK change.
I think this is a good idea not only for this reason. New users might not understand the difference between sparsity pattern and the actual sparse matrix and might try to pass the pattern to the assembler anyway.
Alternatively we can also not error in this case and just assemble into a standard matrix format, which can be queried by finalize_assemble(assembler)
.
2. The new function `create_matrix` can be used as a direct replacement to `create_sparsity_pattern` so when you don't care about modifying the pattern and just want a `SparseMatrixCSC` you can use `create_matrix(dh)`.
I think this is a great convenience for the large portion of our user base.
3. `create_matrix` can also now be used to pass the type of sparse matrix you want, e.g. `create_matrix(SparseMatrixCSC{Float32, Int}, sparsity_pattern)`. Many matrix types are not so nice to write like this though (e.g. `SparseMatrixCSR` have a first type parameter which corresponds to the indexing....) and for a block matrix you need something like `BlockMatrix{Float64, Matrix{SparseMatrixCSC{Float64, Int}}` to fully specify the type. Of course we can make `create_matrix(BlockMatrix, sparsity_pattern)` work and fill in the defaults, but if you don't want the defaults it will be pretty clunky. One idea is to provide aliases in Ferrite, e.g. `Ferrite.MatrixTypes.CSRMatrix{Tv, Ti}`. Thoughts?
I think I am not understanding the issue here. If we want to have a very specific matrix, then we should be specific here. Not sure if Ferrite.MatrixTypes
is a better way to go here, because it makes extensions a bit more clunky.
Nice work @fredrikekre!
- I agree that this is a good change, and think it is reasonable that
start_assemble(create_sparsity_pattern(dh))
errors. -
create_matrix(dh, [ch])
is nice! ~IMO also supportingcreate_matrix(MatrixType, dh, [ch])
would make sense (didn't check if already supported)~ (edit: already there) - I think adding matrix specification conveniences should be postponed until we gather some experience with the new setup. If some cases are frequently used, it could make sense to provide shortcuts
Perhaps a stupid question, but would it not make sense to support sp = create_sparsity_pattern(MatrixType, dh, ch)
to allow easy use for extensions such as BlockSparsityPattern?
I think I am not understanding the issue here. If we want to have a very specific matrix, then we should be specific here. Not sure if Ferrite.MatrixTypes is a better way to go here, because it makes extensions a bit more clunky.
The issue is that it looks ugly to have something like
A = create_matrix(BlockMatrix{Float64, Matrix{SparseMatrixCSR{1, Float64, Int64}}}, sparsity_pattern)
But with aliases we could have e.g.
A = create_matrix(MatrixTypes.BlockMatrix{MatrixTypes.CSRMatrix{Float64, Int}}, sparsity_pattern)
A method for that would be defined in an extension still. The point is that type parameters isn't always "user-friendly" so with MatrixTypes we could normalize it a bit. But we can do this later as Knut points out.
Perhaps a stupid question, but would it not make sense to support sp = create_sparsity_pattern(MatrixType, dh, ch) to allow easy use for extensions such as BlockSparsityPattern?
The matrix type doesn't necessarily map directly to one pattern type. And if you want a specific pattern type, why not call that constructor directly (create_sparsity_pattern!(MyPattern(...), dh, ch)
)?.
As it is right now, DofHandler
defaults to SparsityPattern
which then defaults to SparseMatrixCSC
.
why not call that constructor directly (create_sparsity_pattern!(MyPattern(...), dh, ch))?.
If MyPattern === BlockSparsityPattern
it wouldn't be available as it is defined in the extension, wasn't that the reason for the hack earlier, which then could be avoided?
(I see that this wouldn't always work, but for the extensions for special matrix types, I figured that could be convenient)
BlockSparsityPattern is in Ferrite now, but even with the hack you could still call the constructor (but get an error telling you to load the needed package which you would have to do anyway in order to pass the matrix type?).
Note that the assembler itself is still tightly coupled to SparseMatrixCSC.
I have another consideration which came to my mind.
Is create_sparsity_pattern!
really the name which we want here? I would argue that we might want to have a create_default_sparsity_pattern!
function, which calls internally something like add_H1_sparsity_pattern
to creates default sparsity pattern for standard H1 functions. Then we could factor out the face-coupling pattern (for DG) into a function add_DG_sparsity_pattern
. This way might be able to modularize the sparsity pattern construction a bit for different use-cases. What do you think?
Is
create_sparsity_pattern!
really the name which we want here?
To me, it makes sense to have this function to do the "right thing" (such that it will work) given the inputs. I thought that if you give it a dofhandler with DG-interpolations it will call cross_element_coupling!
? And if you give a constrant handler, it will call constraint_couplings!
etc. (But maybe I misunderstood your explanation earlier @fredrikekre?)
For advanced use-cases (for example if you have additional info that can increase the sparsity), I think the separation with the extra functions are really nice, but I don't think we need to add extra convenience for many different cases (at least not yet).
Yea, right now create_sparsity_pattern!
is a bit of a do-it-all function
Is
create_sparsity_pattern!
really the name which we want here?To me, it makes sense to have this function to do the "right thing" (such that it will work) given the inputs. I thought that if you give it a dofhandler with DG-interpolations it will call
cross_element_coupling!
? And if you give a constrant handler, it will callconstraint_couplings!
etc. (But maybe I misunderstood your explanation earlier @fredrikekre?)
This only happens if you also pass the topology. You cannot differentiate between DG and "normal" P0 on interpolation level, as the coupling is a property of the weak form and not of the interpolation.
Any thoughts on https://github.com/Ferrite-FEM/Ferrite.jl/pull/888#discussion_r1635597188? Perhaps also change create_matrix
-> alloc_matrix
or allocate_matrix
to make it more clear that there is no assembly happening?
Any thoughts on #888 (comment)? Perhaps also change
create_matrix
->alloc_matrix
orallocate_matrix
to make it more clear that there is no assembly happening?
I am not sure if it is really some ambiguity in the word create
here. We can also argue for alloc/allocate/instantiate or similar. I have no preference here either.
I think add_*
is better than create_*
.
I am not sure about create_matrix
or allocate_matrix
. Both are fine with me.
I agree with @lijas that add
is better and somehow consistent with add!
used elsewhere in the code base.
I find that allocate_matrix
is a better name, but it is also not very important.
allocate
heavily suggests that the thing is "non-initialized" and has to be further "processed". So that might make sense in this case.
I am a bit uneasy about the HeapAllocator stuff. While I understand it was fun to write is there really no way whatever this is used for that cannot be used in pure, safe Julia?
PR one matrix
Mesh size | ndofs(dh) | time [s] | %gc | n allocs | alloc |
---|---|---|---|---|---|
10x10x10 | 29114 | 0.28 | 20% | 226 | 223 MiB |
20x20x20 | 216024 | 1.74 | 9% | 901 | 1.4 GiB |
30x30x30 | 710734 | 5.57 | 4.5% | 2.62 k | 4.3 GiB |
40x40x40 | 1663244 | 13.9 | 7.5% | 5.87 k | 9.9 GiB |
50x50x50 | 3223554 | 30.6 | 9.8% | 11.11 k | 19.2 GiB |
60x60x60 | 5541664 | 65.3 | 12.7% | 18.84 k | 32.8 GiB |
70x70x70 | 8767574 | 120.1 | 18.4% | 29.54 k | 51.7 GiB |
master one matrix
Mesh size | ndofs(dh) | time [s] | %gc | n allocs | alloc |
---|---|---|---|---|---|
10x10x10 | 29114 | 0.22 | 41.5% | 3.09 k | 608 MiB |
20x20x20 | 216024 | 1.1 | 15.9% | 3.09 k | 4.1 GiB |
30x30x30 | 710734 | 3.7 | 14.1% | 3.09 k | 12.9 GiB |
40x40x40 | 1663244 | 9.7 | 10.2% | 3.10 k | 30.4 GiB |
50x50x50 | 3223554 | 38.4 | 6.2% | 3.10 k | 58.9 GiB |
60x60x60 | 5541664 | 75.6 | 5.8% | 3.10 k | 101.4 GiB |
70x70x70 | 8767574 | 177.7 | 7.4% | 3.10 k | 171.9 GiB |
PR two matrices
Mesh size | ndofs(dh) | time [s] | %gc | n allocs | alloc |
---|---|---|---|---|---|
10x10x10 | 29114 | 0.3 | 17.9% | 234 | 320 MiB |
20x20x20 | 216024 | 1.9 | 8.6% | 909 | 2.1 GiB |
30x30x30 | 710734 | 6.1 | 5.3% | 2.62 k | 6.6 GiB |
40x40x40 | 1663244 | 14.8 | 8.1% | 5.87 k | 15.4 GiB |
50x50x50 | 3223554 | 33.7 | 9.0% | 11.12 k | 29.6 GiB |
60x60x60 | 5541664 | 72.6 | 12.8% | 18.85 k | 50.8 GiB |
70x70x70 | 8767574 | 134.9 | 17.8% | 29.55 k | 80.1 GiB |
master two matrices
Mesh size | ndofs(dh) | time [s] | %gc | n allocs | alloc |
---|---|---|---|---|---|
10x10x10 | 29114 | 0.34 | 30.2% | 3.16 k | 1.2 GiB |
20x20x20 | 216024 | 2.27 | 22.2% | 3.16 k | 8.3 GiB |
30x30x30 | 710734 | 7.44 | 15.6% | 3.17 k | 25.8 GiB |
40x40x40 | 1663244 | 20.85 | 11.6% | 3.18 k | 60.8 GiB |
50x50x50 | 3223554 | 80.63 | 7.2% | 3.18 k | 117.8 GiB |
60x60x60 | 5541664 | 155.89 | 6.7% | 3.18 k | 202.7 GiB |
(70x70x70 OOM killed)
Code for PR
using Ferrite
n = 10
const grid = generate_grid(Hexahedron, (n, n, n))
const top = ExclusiveTopology(grid)
const dh = DofHandler(grid)
add!(dh, :u, Lagrange{RefHexahedron, 2}()^3)
add!(dh, :p, Lagrange{RefHexahedron, 1}())
close!(dh)
const ch = ConstraintHandler(dh)
const facets = collect_periodic_facets(grid)
add!(ch, PeriodicDirichlet(:u, facets))
close!(ch)
function single_matrix(dh, ch)
dsp = init_sparsity_pattern(dh)
add_cell_entries!(dsp, dh, ch)
add_constraint_entries!(dsp, ch)
K = allocate_matrix(dsp)
return K
end
function double_matrix(dh, ch)
dsp = init_sparsity_pattern(dh)
add_cell_entries!(dsp, dh, ch)
add_constraint_entries!(dsp, ch)
K = allocate_matrix(dsp)
M = allocate_matrix(dsp)
return K, M
end
@time single_matrix(dh, ch)
@time double_matrix(dh, ch)
Code for master
using Ferrite
n = 10
const grid = generate_grid(Hexahedron, (n, n, n))
const top = ExclusiveTopology(grid)
const dh = DofHandler(grid)
add!(dh, :u, Lagrange{RefHexahedron, 2}()^3)
add!(dh, :p, Lagrange{RefHexahedron, 1}())
close!(dh)
const ch = ConstraintHandler(dh)
const facets = collect_periodic_facets(grid)
add!(ch, PeriodicDirichlet(:u, facets))
close!(ch)
function single_matrix(dh, ch)
K = create_sparsity_pattern(dh, ch)
return K
end
function double_matrix(dh, ch)
K = create_sparsity_pattern(dh, ch)
M = create_sparsity_pattern(dh, ch)
return K, M
end
@time single_matrix(dh, ch)
@time double_matrix(dh, ch)
I am a bit uneasy about the HeapAllocator stuff. While I understand it was fun to write is there really no way whatever this is used for that cannot be used in pure, safe Julia?
I simplified it greatly and in particular removed the manual malloc and pointer computation. Now it is just a way to batch allocate. It is very similar to what Knut did in #974 except that it batches based on size so that it becomes a bit easier to reuse memory. I think we can merge the two implementations in the future.
Thanks for the comprehensive benchmarks!
- Is there anything that can be done about the performance regression for one-matrix systems below 40^3. I think this size (1.6 M dofs) is quite common in Ferrite, and e.g. mesh adaptivity could suffer here? (Is there a difference in 2d with a less dense sparsity pattern?)
- Out of curiosity: Do you know why the number of allocs on master barely changes when creating one or two matrices. From the code I would expect the exact double as the alloc size does.
And another question, why is the sparsity pattern row-major? Seems like this adds a small perf-hit with the extra buffer for standard CSC matrices?
Is there anything that can be done about the performance regression for one-matrix systems below 40^3. I think this size (1.6 M dofs) is quite common in Ferrite, and e.g. mesh adaptivity could suffer here? (Is there a difference in 2d with a less dense sparsity pattern?)
It is a trade-off between memory and time -- master uses quite a lot more memory. In my experience that have been the limiting factor. Compared to a linear solve of this size I am not too worried. I suppose also one should factor in that we now have quite a lot more flexibility. We can keep the old memory-hungry alg for allocate_matrix(dh)
but I don't think it is worth the duplicated code. I can check 2D too.
Do you know why the number of allocs on master barely changes when creating one or two matrices.
That does look a bit strange, I will have a look again.
That does look a bit strange, I will have a look again.
I didn't run the function first so almost all the allocations are from compilation. I can update the table later.
We can keep the old memory-hungry alg for allocate_matrix(dh) but I don't think it is worth the duplicated code.
I also don't think that is worth it (and there is no doubt that this PR is an improvement, was more curious if something could be done to get the same perf).
Without compilation:
PR one matrix
Mesh size | ndofs(dh) | time [s] | %gc | n allocs | alloc |
---|---|---|---|---|---|
10x10x10 | 29114 | 0.2 s | 0.0% | 226 | 223.204 MiB |
20x20x20 | 216024 | 1.7 s | 3.0% | 901 | 1.354 GiB |
30x30x30 | 710734 | 5.8 s | 6.9% | 2615 | 4.318 GiB |
40x40x40 | 1663244 | 14.6 s | 8.2% | 5865 | 9.985 GiB |
50x50x50 | 3223554 | 30.6 s | 10.0% | 11107 | 19.177 GiB |
master one matrix
Mesh size | ndofs(dh) | time [s] | %gc | n allocs | alloc |
---|---|---|---|---|---|
10x10x10 | 29114 | 0.1 s | 0.0% | 74 | 608.068 MiB |
20x20x20 | 216024 | 1.2 s | 21.7% | 75 | 4.234 GiB |
30x30x30 | 710734 | 3.9 s | 12.2% | 76 | 12.882 GiB |
40x40x40 | 1663244 | 16.5 s | 7.2% | 81 | 30.423 GiB |
50x50x50 | 3223554 | 42.2 s | 6.0% | 82 | 58.908 GiB |
PR two matrices
Mesh size | ndofs(dh) | time [s] | %gc | n allocs | alloc |
---|---|---|---|---|---|
10x10x10 | 29114 | 0.3 s | 0.0% | 234 | 320.632 MiB |
20x20x20 | 216024 | 1.9 s | 5.4% | 909 | 2.058 GiB |
30x30x30 | 710734 | 6.2 s | 7.6% | 2623 | 6.628 GiB |
40x40x40 | 1663244 | 15.6 s | 8.3% | 5873 | 15.382 GiB |
50x50x50 | 3223554 | 37.7 s | 10.4% | 11115 | 29.627 GiB |
master two matrices
Mesh size | ndofs(dh) | time [s] | %gc | n allocs | alloc |
---|---|---|---|---|---|
10x10x10 | 29114 | 0.3 s | 18.3% | 148 | 1.188 GiB |
20x20x20 | 216024 | 2.3 s | 25.5% | 150 | 8.468 GiB |
30x30x30 | 710734 | 8.1 s | 14.0% | 152 | 25.764 GiB |
40x40x40 | 1663244 | 28.9 s | 9.7% | 162 | 60.846 GiB |
50x50x50 | 3223554 | 87.6 s | 7.0% | 164 | 117.816 GiB |
The main parts are searchsortedfirst
and insert
(with setindex
a major part) since we want to insert entries in sorted order and I don't really know how to speed up those parts. In the old algorithm we just inserted everyting blindly, which means a lot of duplicates (and thus more memory etc).