Ferrite.jl
Ferrite.jl copied to clipboard
Introduce OrderedSets
Title should say everything. This should increase cache locality for some problems (e.g. when using our generated grids) and we now have the guarantee that the iteration order for subdofhandlers and faces can be precisely controlled.
Closes https://github.com/Ferrite-FEM/Ferrite.jl/issues/631 .
Okay, even with this there is still a quite big performance hit in using the SubDofHandler.
using Ferrite, SparseArrays, BenchmarkTools
grid = generate_grid(Quadrilateral, (1000, 1000));
ip = Lagrange{RefQuadrilateral, 1}()
qr = QuadratureRule{RefQuadrilateral}(2)
cellvalues = CellValues(qr, ip);
dh = DofHandler(grid)
add!(dh, :u, ip)
close!(dh);
K = create_sparsity_pattern(dh)
ch = ConstraintHandler(dh);
∂Ω = union(
getfaceset(grid, "left"),
getfaceset(grid, "right"),
getfaceset(grid, "top"),
getfaceset(grid, "bottom"),
);
dbc = Dirichlet(:u, ∂Ω, (x, t) -> 0)
add!(ch, dbc);
close!(ch)
function assemble_element!(Ke::Matrix, fe::Vector, cellvalues::CellValues)
n_basefuncs = getnbasefunctions(cellvalues)
## Reset to 0
fill!(Ke, 0)
fill!(fe, 0)
## Loop over quadrature points
for q_point in 1:getnquadpoints(cellvalues)
## Get the quadrature weight
dΩ = getdetJdV(cellvalues, q_point)
## Loop over test shape functions
for i in 1:n_basefuncs
δu = shape_value(cellvalues, q_point, i)
∇δu = shape_gradient(cellvalues, q_point, i)
## Add contribution to fe
fe[i] += δu * dΩ
## Loop over trial shape functions
for j in 1:n_basefuncs
∇u = shape_gradient(cellvalues, q_point, j)
## Add contribution to Ke
Ke[i, j] += (∇δu ⋅ ∇u) * dΩ
end
end
end
return Ke, fe
end
function assemble_global(cellvalues::CellValues, K::SparseMatrixCSC, dh::DofHandler)
## Allocate the element stiffness matrix and element force vector
n_basefuncs = getnbasefunctions(cellvalues)
Ke = zeros(n_basefuncs, n_basefuncs)
fe = zeros(n_basefuncs)
## Allocate global force vector f
f = zeros(ndofs(dh))
## Create an assembler
assembler = start_assemble(K, f)
## Loop over all cels
for cell in CellIterator(dh)
## Reinitialize cellvalues for this cell
reinit!(cellvalues, cell)
## Compute element contribution
assemble_element!(Ke, fe, cellvalues)
## Assemble Ke and fe into K and f
assemble!(assembler, celldofs(cell), Ke, fe)
end
return nothing
end
function assemble_global2(cellvalues::CellValues, K::SparseMatrixCSC, dh::DofHandler)
## Allocate the element stiffness matrix and element force vector
n_basefuncs = getnbasefunctions(cellvalues)
Ke = zeros(n_basefuncs, n_basefuncs)
fe = zeros(n_basefuncs)
## Allocate global force vector f
f = zeros(ndofs(dh))
## Create an assembler
assembler = start_assemble(K, f)
## Loop over all cels
for cell in CellIterator(dh.subdofhandlers[1])
## Reinitialize cellvalues for this cell
reinit!(cellvalues, cell)
## Compute element contribution
assemble_element!(Ke, fe, cellvalues)
## Assemble Ke and fe into K and f
assemble!(assembler, celldofs(cell), Ke, fe)
end
return nothing
end
@btime assemble_global(cellvalues, K, dh); # 494.943 ms (12 allocations: 7.65 MiB)
@btime assemble_global2(cellvalues, K, dh); # 655.668 ms (15 allocations: 7.65 MiB)
cc @kimauth
Did a quick benchmark, and I think this check: https://github.com/Ferrite-FEM/Ferrite.jl/blob/fe44e7f6297158e9bbb62fd4bc9adaa1d79929f2/src/Dofs/DofHandler.jl#L192
Is the reason for the worse performance. At least it would make sense to provide a fast unsafe path IMO (or, a quick solution for CellCache
is to just pass the sdh.dh
) instead. Theoretically, using the SubDofHandler could be faster by one lookup, because ndofs_per_cell
is just one value...
Thanks for the pointer. If the performance issue is nothing conceptual, then I would leave the PR as it is (+- updating the deps) and fixing it is left for another PR.
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 93.86%. Comparing base (
ccf583d
) to head (bcae4a4
).
Additional details and impacted files
@@ Coverage Diff @@
## master #834 +/- ##
==========================================
+ Coverage 93.84% 93.86% +0.01%
==========================================
Files 36 36
Lines 5293 5310 +17
==========================================
+ Hits 4967 4984 +17
Misses 326 326
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Did a quick benchmark, and I think this check:
https://github.com/Ferrite-FEM/Ferrite.jl/blob/fe44e7f6297158e9bbb62fd4bc9adaa1d79929f2/src/Dofs/DofHandler.jl#L192
Is the reason for the worse performance. At least it would make sense to provide a fast unsafe path IMO (or, a quick solution for
CellCache
is to just pass thesdh.dh
) instead.
Confirmed locally that dropping this check on this branch brings the performance close to each other:
julia> @btime assemble_global(cellvalues, K, dh); # 494.943 ms (12 allocations: 7.65 MiB)
478.872 ms (12 allocations: 7.65 MiB)
julia> @btime assemble_global2(cellvalues, K, dh); # 655.668 ms (15 allocations: 7.65 MiB)
499.271 ms (15 allocations: 7.65 MiB)
where the comment is the reference timing with the check included. Note that the timings on my machine are not super stable. Some of the remaining time difference can be attributed to the iteration of the OrderedSet, which increases the memory pressure. Hence I think it is still worth to investigate the use of ranges in the iterators. Maybe we can add a boolean to the SubDofHandler with information about having a continuous range or not? Or we can take another look at https://github.com/Ferrite-FEM/Ferrite.jl/pull/625
Theoretically, using the SubDofHandler could be faster by one lookup, because ndofs_per_cell is just one value...
If that is the case, then we should do it and force the CellIterator to iterate over the SubDofHandler in the case that there is only 1 subdofhandler.
Also, supersedes https://github.com/Ferrite-FEM/Ferrite.jl/pull/654/files
Fixed up after #914.
Would probably be nice to automatically sort the sets when adding a regular set to the grid, but can be done later.
@KnutAM something isn't right here with the porous media example (probably something with FerriteMeshParser?), can you have a look?
FWIW, there have been quite a few improvements to the Base Dict
that has not been carried over to the OrderedDict
. I am not sure this matters in practice but:
- If someone is interested it might be worth trying to bring those improvements over to
OrderedDict
. - It is worth benchmarking again on newer Julia versions (1.11).
I'm just stuck with a feeling that this isn't the optimal solution, but I also don't have a better alternative right now.
I totally agree here on both points. I have tried different designs an none of them were really satisfactory to me.
Would probably be nice to automatically sort the sets when adding a regular set to the grid, but can be done later.
Why? Practially this only makes sense if the underlying grid has some ordering as in our generated ones and if your subdomain also has to follow this order. In generated meshes like e.g. the ones from Gmsh the optimal ordering is almost always not the ordering imposed by the numbering of the elements and you can mess up the performance if you sort them.
* It is worth benchmarking again on newer Julia versions (1.11).
The Set
or the OrderedSet
version?
I totally agree here on both points. I have tried different designs an none of them were really satisfactory to me.
What's wrong with this? We can can relax docs to say that getcellset
etc return a collection instead of explicitly OrderedSet
if you want the freedom to change it in the future.
Would probably be nice to automatically sort the sets when adding a regular set to the grid
See the comments in the code. addcellset!
etc loop over the cells in order so the order of the added set will match. If you bring your own collection then the order is preserved.
something isn't right here with the porous media example (probably something with FerriteMeshParser?), can you have a look?
Hopefully https://github.com/Ferrite-FEM/Ferrite.jl/pull/834/commits/14d880092c3890e077a5c7ce81b350c26d74c4b9 fixes it. I also tested locally that it works without the overload with https://github.com/Ferrite-FEM/FerriteMeshParser.jl/pull/43. That is a breaking release for FerriteMeshParser.jl though, so need to validate that no other breaking changes are required to support Ferrite v1 before merging and releasing. Once that's done, I'll PR removing that hack, and do the same for FerriteGmsh.
We can can relax docs to say that getcellset etc return a collection instead of explicitly OrderedSet if you want the freedom to change it in the future.
I think it would make sense to say that it returns an ordered collection as I think this PR shows that we want to keep that.
Would probably be nice to automatically sort the sets when adding a regular set to the grid
See the comments in the code. addcellset! etc loop over the cells in order so the order of the added set will match. If you bring your own collection then the order is preserved.
Right, that makes sense, nvm!
What's wrong with this?
I thought OrderedSet
stored the information twice compared to a Set
, but I seem to be wrong, so maybe this is the optimal approach (of course, when possible a UnitRange
is better but not always applicable and Vector
is smaller but has slow in
).
struct Index
t::NTuple{2,Int}
end
cells = rand(1:10000, 1000);
facets = rand(1:4, 1000);
inds = [Index((c, f)) for (c, f) in zip(cells, facets)];
julia> Base.summarysize.((inds, Set(inds), OrderedSet(inds)))
(16040, 35008, 32320)
If you need a vector we can expose the internal vector of the set.
IIRC one open issue here is that we cannot directly parallelize over the ordered set. We want to add an how to for this.