Ferrite.jl icon indicating copy to clipboard operation
Ferrite.jl copied to clipboard

Implement a proper sparsity pattern

Open fredrikekre opened this issue 11 months ago • 9 comments

This PR introduces struct SparsityPattern for a way to work with the pattern directly instead of producing a matrix directly.

Some advantages of this are:

  • Make it possible to add custom entries into the pattern (e.g. insert the necessary entries for a lagrange multiplier)
  • Size is no longer limited to ndofs(dh) x ndofs(dh) so it is easy to insert new dofs
  • Simpler to support new matrix types. Before this patch everything in the construction was tightly coupled to SparseMatrixCSC (e.g. how element connections and constraint condensation were implemented). With this patch everything is handled on the SparsityPattern level independent of what type of matrix will be instantiated in the end. All that is needed to support a new matrix type is now to write a MatrixType(::SparsityPattern) constructor.
  • Enables instantiation of multiple matrices based on the same pattern (e.g. stiffness and mass matrices).
  • Faster and lower memory usage (benchmarks incoming)

fredrikekre avatar Feb 29 '24 12:02 fredrikekre

Supersedes #596 ?

termi-official avatar Feb 29 '24 16:02 termi-official

Codecov Report

Attention: Patch coverage is 97.87798% with 8 lines in your changes missing coverage. Please review.

Project coverage is 93.69%. Comparing base (355dfc8) to head (aff9e54).

Files Patch % Lines
src/Dofs/sparsity_pattern.jl 97.38% 5 Missing :warning:
src/PoolAllocator.jl 97.70% 3 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #888      +/-   ##
==========================================
- Coverage   93.79%   93.69%   -0.11%     
==========================================
  Files          36       38       +2     
  Lines        5499     5723     +224     
==========================================
+ Hits         5158     5362     +204     
- Misses        341      361      +20     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Mar 01 '24 00:03 codecov[bot]

Before cleaning up and finishing this PR there are some things I would like input on:

  1. create_sparsity_pattern(dh::DofHandler, ...) still exist but now return a sparsity pattern (and not the matrix). If you don't update your code there will be an error when you try start_assemble which we can intercept and print a nice message (you might run into a MethodError earlier though). I think this is an OK change.
  2. The new function create_matrix can be used as a direct replacement to create_sparsity_pattern so when you don't care about modifying the pattern and just want a SparseMatrixCSC you can use create_matrix(dh).
  3. create_matrix can also now be used to pass the type of sparse matrix you want, e.g. create_matrix(SparseMatrixCSC{Float32, Int}, sparsity_pattern). Many matrix types are not so nice to write like this though (e.g. SparseMatrixCSR have a first type parameter which corresponds to the indexing....) and for a block matrix you need something like BlockMatrix{Float64, Matrix{SparseMatrixCSC{Float64, Int}} to fully specify the type. Of course we can make create_matrix(BlockMatrix, sparsity_pattern) work and fill in the defaults, but if you don't want the defaults it will be pretty clunky. One idea is to provide aliases in Ferrite, e.g. Ferrite.MatrixTypes.CSRMatrix{Tv, Ti}. Thoughts?

fredrikekre avatar Mar 14 '24 12:03 fredrikekre

Before cleaning up and finishing this PR there are some things I would like input on:

1. `create_sparsity_pattern(dh::DofHandler, ...)` still exist but now return a sparsity pattern (and not the matrix). If you don't update your code there will be an error when you try `start_assemble` which we can intercept and print a nice message (you might run into a `MethodError` earlier though). I think this is an OK change.

I think this is a good idea not only for this reason. New users might not understand the difference between sparsity pattern and the actual sparse matrix and might try to pass the pattern to the assembler anyway.

Alternatively we can also not error in this case and just assemble into a standard matrix format, which can be queried by finalize_assemble(assembler).

2. The new function `create_matrix` can be used as a direct replacement to `create_sparsity_pattern` so when you don't care about modifying the pattern and just want a `SparseMatrixCSC` you can use `create_matrix(dh)`.

I think this is a great convenience for the large portion of our user base.

3. `create_matrix` can also now be used to pass the type of sparse matrix you want, e.g. `create_matrix(SparseMatrixCSC{Float32, Int}, sparsity_pattern)`. Many matrix types are not so nice to write like this though (e.g. `SparseMatrixCSR` have a first type parameter which corresponds to the indexing....) and for a block matrix you need something like `BlockMatrix{Float64, Matrix{SparseMatrixCSC{Float64, Int}}` to fully specify the type. Of course we can make `create_matrix(BlockMatrix, sparsity_pattern)` work and fill in the defaults, but if you don't want the defaults it will be pretty clunky. One idea is to provide aliases in Ferrite, e.g. `Ferrite.MatrixTypes.CSRMatrix{Tv, Ti}`. Thoughts?

I think I am not understanding the issue here. If we want to have a very specific matrix, then we should be specific here. Not sure if Ferrite.MatrixTypes is a better way to go here, because it makes extensions a bit more clunky.

termi-official avatar Mar 14 '24 13:03 termi-official

Nice work @fredrikekre!

  1. I agree that this is a good change, and think it is reasonable that start_assemble(create_sparsity_pattern(dh)) errors.
  2. create_matrix(dh, [ch]) is nice! ~IMO also supporting create_matrix(MatrixType, dh, [ch]) would make sense (didn't check if already supported)~ (edit: already there)
  3. I think adding matrix specification conveniences should be postponed until we gather some experience with the new setup. If some cases are frequently used, it could make sense to provide shortcuts

KnutAM avatar Mar 14 '24 13:03 KnutAM

Perhaps a stupid question, but would it not make sense to support sp = create_sparsity_pattern(MatrixType, dh, ch) to allow easy use for extensions such as BlockSparsityPattern?

KnutAM avatar Mar 14 '24 13:03 KnutAM

I think I am not understanding the issue here. If we want to have a very specific matrix, then we should be specific here. Not sure if Ferrite.MatrixTypes is a better way to go here, because it makes extensions a bit more clunky.

The issue is that it looks ugly to have something like

A = create_matrix(BlockMatrix{Float64, Matrix{SparseMatrixCSR{1, Float64, Int64}}}, sparsity_pattern)

But with aliases we could have e.g.

A = create_matrix(MatrixTypes.BlockMatrix{MatrixTypes.CSRMatrix{Float64, Int}}, sparsity_pattern)

A method for that would be defined in an extension still. The point is that type parameters isn't always "user-friendly" so with MatrixTypes we could normalize it a bit. But we can do this later as Knut points out.

Perhaps a stupid question, but would it not make sense to support sp = create_sparsity_pattern(MatrixType, dh, ch) to allow easy use for extensions such as BlockSparsityPattern?

The matrix type doesn't necessarily map directly to one pattern type. And if you want a specific pattern type, why not call that constructor directly (create_sparsity_pattern!(MyPattern(...), dh, ch))?.

As it is right now, DofHandler defaults to SparsityPattern which then defaults to SparseMatrixCSC.

fredrikekre avatar Mar 14 '24 22:03 fredrikekre

why not call that constructor directly (create_sparsity_pattern!(MyPattern(...), dh, ch))?.

If MyPattern === BlockSparsityPattern it wouldn't be available as it is defined in the extension, wasn't that the reason for the hack earlier, which then could be avoided? (I see that this wouldn't always work, but for the extensions for special matrix types, I figured that could be convenient)

KnutAM avatar Mar 14 '24 22:03 KnutAM

BlockSparsityPattern is in Ferrite now, but even with the hack you could still call the constructor (but get an error telling you to load the needed package which you would have to do anyway in order to pass the matrix type?).

fredrikekre avatar Mar 14 '24 22:03 fredrikekre

Note that the assembler itself is still tightly coupled to SparseMatrixCSC.

termi-official avatar May 15 '24 11:05 termi-official

I have another consideration which came to my mind.

Is create_sparsity_pattern! really the name which we want here? I would argue that we might want to have a create_default_sparsity_pattern! function, which calls internally something like add_H1_sparsity_pattern to creates default sparsity pattern for standard H1 functions. Then we could factor out the face-coupling pattern (for DG) into a function add_DG_sparsity_pattern. This way might be able to modularize the sparsity pattern construction a bit for different use-cases. What do you think?

termi-official avatar May 22 '24 11:05 termi-official

Is create_sparsity_pattern! really the name which we want here?

To me, it makes sense to have this function to do the "right thing" (such that it will work) given the inputs. I thought that if you give it a dofhandler with DG-interpolations it will call cross_element_coupling!? And if you give a constrant handler, it will call constraint_couplings! etc. (But maybe I misunderstood your explanation earlier @fredrikekre?)

For advanced use-cases (for example if you have additional info that can increase the sparsity), I think the separation with the extra functions are really nice, but I don't think we need to add extra convenience for many different cases (at least not yet).

KnutAM avatar May 23 '24 10:05 KnutAM

Yea, right now create_sparsity_pattern! is a bit of a do-it-all function

fredrikekre avatar May 23 '24 10:05 fredrikekre

Is create_sparsity_pattern! really the name which we want here?

To me, it makes sense to have this function to do the "right thing" (such that it will work) given the inputs. I thought that if you give it a dofhandler with DG-interpolations it will call cross_element_coupling!? And if you give a constrant handler, it will call constraint_couplings! etc. (But maybe I misunderstood your explanation earlier @fredrikekre?)

This only happens if you also pass the topology. You cannot differentiate between DG and "normal" P0 on interpolation level, as the coupling is a property of the weak form and not of the interpolation.

termi-official avatar May 23 '24 11:05 termi-official

Any thoughts on https://github.com/Ferrite-FEM/Ferrite.jl/pull/888#discussion_r1635597188? Perhaps also change create_matrix -> alloc_matrix or allocate_matrix to make it more clear that there is no assembly happening?

fredrikekre avatar Jun 17 '24 07:06 fredrikekre

Any thoughts on #888 (comment)? Perhaps also change create_matrix -> alloc_matrix or allocate_matrix to make it more clear that there is no assembly happening?

I am not sure if it is really some ambiguity in the word create here. We can also argue for alloc/allocate/instantiate or similar. I have no preference here either.

termi-official avatar Jun 17 '24 07:06 termi-official

I think add_* is better than create_*. I am not sure about create_matrix or allocate_matrix. Both are fine with me.

lijas avatar Jun 17 '24 08:06 lijas

I agree with @lijas that add is better and somehow consistent with add! used elsewhere in the code base.

I find that allocate_matrix is a better name, but it is also not very important.

KnutAM avatar Jun 17 '24 09:06 KnutAM

allocate heavily suggests that the thing is "non-initialized" and has to be further "processed". So that might make sense in this case.

KristofferC avatar Jun 17 '24 11:06 KristofferC

I am a bit uneasy about the HeapAllocator stuff. While I understand it was fun to write is there really no way whatever this is used for that cannot be used in pure, safe Julia?

KristofferC avatar Jun 20 '24 11:06 KristofferC

PR one matrix

Mesh size ndofs(dh) time [s] %gc n allocs alloc
10x10x10 29114 0.28 20% 226 223 MiB
20x20x20 216024 1.74 9% 901 1.4 GiB
30x30x30 710734 5.57 4.5% 2.62 k 4.3 GiB
40x40x40 1663244 13.9 7.5% 5.87 k 9.9 GiB
50x50x50 3223554 30.6 9.8% 11.11 k 19.2 GiB
60x60x60 5541664 65.3 12.7% 18.84 k 32.8 GiB
70x70x70 8767574 120.1 18.4% 29.54 k 51.7 GiB

master one matrix

Mesh size ndofs(dh) time [s] %gc n allocs alloc
10x10x10 29114 0.22 41.5% 3.09 k 608 MiB
20x20x20 216024 1.1 15.9% 3.09 k 4.1 GiB
30x30x30 710734 3.7 14.1% 3.09 k 12.9 GiB
40x40x40 1663244 9.7 10.2% 3.10 k 30.4 GiB
50x50x50 3223554 38.4 6.2% 3.10 k 58.9 GiB
60x60x60 5541664 75.6 5.8% 3.10 k 101.4 GiB
70x70x70 8767574 177.7 7.4% 3.10 k 171.9 GiB

PR two matrices

Mesh size ndofs(dh) time [s] %gc n allocs alloc
10x10x10 29114 0.3 17.9% 234 320 MiB
20x20x20 216024 1.9 8.6% 909 2.1 GiB
30x30x30 710734 6.1 5.3% 2.62 k 6.6 GiB
40x40x40 1663244 14.8 8.1% 5.87 k 15.4 GiB
50x50x50 3223554 33.7 9.0% 11.12 k 29.6 GiB
60x60x60 5541664 72.6 12.8% 18.85 k 50.8 GiB
70x70x70 8767574 134.9 17.8% 29.55 k 80.1 GiB

master two matrices

Mesh size ndofs(dh) time [s] %gc n allocs alloc
10x10x10 29114 0.34 30.2% 3.16 k 1.2 GiB
20x20x20 216024 2.27 22.2% 3.16 k 8.3 GiB
30x30x30 710734 7.44 15.6% 3.17 k 25.8 GiB
40x40x40 1663244 20.85 11.6% 3.18 k 60.8 GiB
50x50x50 3223554 80.63 7.2% 3.18 k 117.8 GiB
60x60x60 5541664 155.89 6.7% 3.18 k 202.7 GiB

(70x70x70 OOM killed)

Code for PR

using Ferrite

n = 10

const grid = generate_grid(Hexahedron, (n, n, n))
const top = ExclusiveTopology(grid)
const dh = DofHandler(grid)
add!(dh, :u, Lagrange{RefHexahedron, 2}()^3)
add!(dh, :p, Lagrange{RefHexahedron, 1}())
close!(dh)

const ch = ConstraintHandler(dh)
const facets = collect_periodic_facets(grid)
add!(ch, PeriodicDirichlet(:u, facets))
close!(ch)

function single_matrix(dh, ch)
    dsp = init_sparsity_pattern(dh)
    add_cell_entries!(dsp, dh, ch)
    add_constraint_entries!(dsp, ch)
    K = allocate_matrix(dsp)
    return K
end

function double_matrix(dh, ch)
    dsp = init_sparsity_pattern(dh)
    add_cell_entries!(dsp, dh, ch)
    add_constraint_entries!(dsp, ch)
    K = allocate_matrix(dsp)
    M = allocate_matrix(dsp)
    return K, M
end

@time single_matrix(dh, ch)
@time double_matrix(dh, ch)

Code for master

using Ferrite

n = 10

const grid = generate_grid(Hexahedron, (n, n, n))
const top = ExclusiveTopology(grid)
const dh = DofHandler(grid)
add!(dh, :u, Lagrange{RefHexahedron, 2}()^3)
add!(dh, :p, Lagrange{RefHexahedron, 1}())
close!(dh)

const ch = ConstraintHandler(dh)
const facets = collect_periodic_facets(grid)
add!(ch, PeriodicDirichlet(:u, facets))
close!(ch)

function single_matrix(dh, ch)
    K = create_sparsity_pattern(dh, ch)
    return K
end

function double_matrix(dh, ch)
    K = create_sparsity_pattern(dh, ch)
    M = create_sparsity_pattern(dh, ch)
    return K, M
end

@time single_matrix(dh, ch)
@time double_matrix(dh, ch)

fredrikekre avatar Jun 27 '24 12:06 fredrikekre

I am a bit uneasy about the HeapAllocator stuff. While I understand it was fun to write is there really no way whatever this is used for that cannot be used in pure, safe Julia?

I simplified it greatly and in particular removed the manual malloc and pointer computation. Now it is just a way to batch allocate. It is very similar to what Knut did in #974 except that it batches based on size so that it becomes a bit easier to reuse memory. I think we can merge the two implementations in the future.

fredrikekre avatar Jun 27 '24 21:06 fredrikekre

Thanks for the comprehensive benchmarks!

  • Is there anything that can be done about the performance regression for one-matrix systems below 40^3. I think this size (1.6 M dofs) is quite common in Ferrite, and e.g. mesh adaptivity could suffer here? (Is there a difference in 2d with a less dense sparsity pattern?)
  • Out of curiosity: Do you know why the number of allocs on master barely changes when creating one or two matrices. From the code I would expect the exact double as the alloc size does.

And another question, why is the sparsity pattern row-major? Seems like this adds a small perf-hit with the extra buffer for standard CSC matrices?

KnutAM avatar Jun 28 '24 07:06 KnutAM

Is there anything that can be done about the performance regression for one-matrix systems below 40^3. I think this size (1.6 M dofs) is quite common in Ferrite, and e.g. mesh adaptivity could suffer here? (Is there a difference in 2d with a less dense sparsity pattern?)

It is a trade-off between memory and time -- master uses quite a lot more memory. In my experience that have been the limiting factor. Compared to a linear solve of this size I am not too worried. I suppose also one should factor in that we now have quite a lot more flexibility. We can keep the old memory-hungry alg for allocate_matrix(dh) but I don't think it is worth the duplicated code. I can check 2D too.

Do you know why the number of allocs on master barely changes when creating one or two matrices.

That does look a bit strange, I will have a look again.

fredrikekre avatar Jun 28 '24 07:06 fredrikekre

That does look a bit strange, I will have a look again.

I didn't run the function first so almost all the allocations are from compilation. I can update the table later.

fredrikekre avatar Jun 28 '24 07:06 fredrikekre

We can keep the old memory-hungry alg for allocate_matrix(dh) but I don't think it is worth the duplicated code.

I also don't think that is worth it (and there is no doubt that this PR is an improvement, was more curious if something could be done to get the same perf).

KnutAM avatar Jun 28 '24 07:06 KnutAM

Without compilation:

PR one matrix

Mesh size ndofs(dh) time [s] %gc n allocs alloc
10x10x10 29114 0.2 s 0.0% 226 223.204 MiB
20x20x20 216024 1.7 s 3.0% 901 1.354 GiB
30x30x30 710734 5.8 s 6.9% 2615 4.318 GiB
40x40x40 1663244 14.6 s 8.2% 5865 9.985 GiB
50x50x50 3223554 30.6 s 10.0% 11107 19.177 GiB

master one matrix

Mesh size ndofs(dh) time [s] %gc n allocs alloc
10x10x10 29114 0.1 s 0.0% 74 608.068 MiB
20x20x20 216024 1.2 s 21.7% 75 4.234 GiB
30x30x30 710734 3.9 s 12.2% 76 12.882 GiB
40x40x40 1663244 16.5 s 7.2% 81 30.423 GiB
50x50x50 3223554 42.2 s 6.0% 82 58.908 GiB

PR two matrices

Mesh size ndofs(dh) time [s] %gc n allocs alloc
10x10x10 29114 0.3 s 0.0% 234 320.632 MiB
20x20x20 216024 1.9 s 5.4% 909 2.058 GiB
30x30x30 710734 6.2 s 7.6% 2623 6.628 GiB
40x40x40 1663244 15.6 s 8.3% 5873 15.382 GiB
50x50x50 3223554 37.7 s 10.4% 11115 29.627 GiB

master two matrices

Mesh size ndofs(dh) time [s] %gc n allocs alloc
10x10x10 29114 0.3 s 18.3% 148 1.188 GiB
20x20x20 216024 2.3 s 25.5% 150 8.468 GiB
30x30x30 710734 8.1 s 14.0% 152 25.764 GiB
40x40x40 1663244 28.9 s 9.7% 162 60.846 GiB
50x50x50 3223554 87.6 s 7.0% 164 117.816 GiB

fredrikekre avatar Jun 28 '24 08:06 fredrikekre

The main parts are searchsortedfirst and insert (with setindex a major part) since we want to insert entries in sorted order and I don't really know how to speed up those parts. In the old algorithm we just inserted everyting blindly, which means a lot of duplicates (and thus more memory etc).

Screenshot 2024-06-28 at 10 20 59

fredrikekre avatar Jun 28 '24 08:06 fredrikekre