nuto
nuto copied to clipboard
Decoupling of assembler and cells
What's the problem?
Currently, the assemble has functions like
GlobalDofVector BuildVector(const Group<CellInterface>& cells, CellInterface::VectorFunction f) const;
and in there does things like calling cell.Integrate(f). These things (cells, functions-to-be-integrated) are not really part of assembly. Ideally it would just get a vector (matrix) and the numbering and insert them it the appropriate place into the global matrix.
Problems:
- Does more than it should: assembly + call
cell.Integrate() - Unnecessary dependency on
GrouporCellInterface - Can't assemble matrices that don't originate from cells
- Modifying the local matrix before assembly leads to duplicate code (
BuildDiagonallyLumpedMatrix)
Proposal
Instead of writing the cells and functions directly at the assembler, the assembler would get a generator that will return vector (matrix) + numbering when asked. The assembler does not, and should not, care how these are generated. Working with functions and cells could be a derived class / implementation of a generator base / concept. If additional transformations are to be applied to a local output (mass lumping, condensation), these can be performed in the generator.
Code
The return type of such a generator will be assumed to be using Entry = std::pair<Eigen::VectorXd, Eigen::VectorXi>;. A simple implementation could be
class EntryGenerator
{
public:
virtual Entry next() = 0;
virtual bool isValid() = 0;
};
which would allow the assembler to do something like this
while(generator.isValid())
{
Entry entry = generator.next();
globalVec[entry.second] = entry.first;
}
This is nice because it is dead simple. Another approach, for fans of syntactic sugar and range-based for loops:
class EntryGenerator
{
public:
virtual EntryIterator begin() = 0;
virtual EntryIterator end() = 0;
// ...
};
could result in nice syntax like this
for (Entry entry : generator)
{
globalVec[entry.second] = entry.first;
}
However, this results in a bit of code for the iterators. A complete example can be seen here.
Questions
- Do you like the idea?
- What would an
Entrylook like? - Dead simple generator vs iterators?
- Dynamic vs static polymorphism of generators?
<insert-your-concern-here>
Good! If I understand it correctly, based on the current implementation of the assembler: the loops over cells and dof types are then done via generator.next(). And the entry the generator returns is the result of cell.integrate(f)[someDof] and cell.DofNumbering(someDof), yes?
I totally agree, the assembler should only assemble a matrix/vector without performing the loop over the cells. We should however discuss, where exactly this class is stored. IMO, this belongs to the problem (see for example TimeDependentProblem).
I think there can be nothing bad be found about this idea. Splitting tasks and dependencies is great! Some thoughts:
- In your code example - which I honestly find quite complicated (will
.isValid()and.next()be easier?) - the fact of the lazy evaluation is IMO not present (or I really misunderstand the code, sorry then). It seems like the entryCellEntryGeneratorstores all its entries in a vector. This is not feasible for e.g. all dense element matrices. For a better understanding, could you maybe adapt your example to take a bunch of objects (-> our cell group) and a function (-> our MatrixFunction) and evaluates them, when they are needed? - In OldNuTo, this was the only loop that is OpenMP parallel. Do we want to think about that as well? It may require the iterator to be a random access iterator. No idea how that will work. But maybe: YAGNI when we focus on MPI parallelization.
- Would it be nice to have this whole loop internal, like
std::transform? We may end up with
GlobalSparseMatrix m;
auto ToGlobalMatrix = [&](Entry& e)
{
NuTo::Assemble(m, e);
// or
NuTo::Assemble(m, MakeLumpled(e));
};
NuTo::LazyEvaluateLoop(cells.begin(), cells.end(), HessianFunction, ToGlobalMatrix);
(We could then add something like an execution policy for OpenMP)
- Where would we put this code? In each main file? Only internally - e.g. wrapped in an
Assemblerclass? This may also influence the design.
- The generator stores these just for the example. I didn't want to write groups, cells and functions. Seems this was confusing, sorry for that. Of course I'm not proposing to store all the local matrices.
- Personally: YAGNI. If anyone is very interested in parallelization and sees a problem with this idea: I'm not set on these generators, I just want the decoupling.
- Looks good as well. A more complete example would be interesting.
- At first, I would just put it in the assembler:
// in main / EquationSystem / TimeDependentProblem / WhateverWrapperIsHipRightNow
CellEntryGenerator gen(cells, gradientFunction);
Assembler assembler(gen);
// in Assembler
GlobalVector globalVec;
for (Entry entry : mGenerator)
{
globalVec[entry.second] = entry.first;
}
Refactoring it out can happen in a second step if necessary.
Another IMO huge advantage could be a decoupling from CellInterface. On first glance, we could have a TimeDependentCell with methods like Gradient(t, dt) and Hessian(t, dt). This cell does not necessarily implement CellInterface. With corresponding entry generators like GraidentGenerator(cells, t, dt) and HessianGenerator(cells, t, dt) we could still assemble those.
I added some template stuff to the prototype of @Psirus. The main advantage: The methods begin() and end() that are the same for all classes, are now defined in the common interface. Someone, who wants to create his own generator, only needs to implement Next(), IsValid(), Get().
Code can be found here.
I implemented a version of that in this branch. There is, e.g. a DofMatrixGenerator that is the base class of CellMatrixEntries (Naming...?). Things I noticed:
- For now, we have to pass via nonconst reference, which is a bit bad.
- Why reference? Since
DofMatrixGeneratoris an abstract class, we cannot pass by value. - Why nonconst? Because
DofMatrixGenerator::Get()orDofMatrixGenerator::begin()are not const.Get(), for example, modifies the state. (We could make everything mutable...) - Why is this a problem? We cannot pass temporaries in the assembler method and always need some extra line that declare the generator.
- Why reference? Since
- For now, we cannot reuse the generator. Once
IsValid()is false, it is useless. Especially, in combination of having to pass it via reference, this is ugly. Introducing aReset()may solve the problem. But when is that called? At the beginning of the assembly? Manually? Or have aClone()method instead that returns a new valid instance? - For now, there is
CellMatrixEntries, CellVectorEntries, CellLumpedMatrixEntriesthat all work on theCellInterfaceand share a lot of code. But I cannot think of an easy refactoring. - For now, we have all this template code that is only specialized for
MatrixEntryandVectorEntry. So I feel like moving it frombasetomechanicsand specialize it for those types in a source file. Whymechanics? Because both entry types needmechanics/dofs/DofVector. [easy fix...]
The recent approach was to have some iterator class that performs the loop. So the assembler assembles only matrices and vectors but still performs the loop over all cells. I think, splitting that into features
- loop over cells (or something else that generates element contributions)
- adding an element contribution to a global matrix/vector
may simplify this a lot.
Point 1) may result in a very short (templates may be viable) class/function like
//! TFunction would return an equivalent of std::pair(Cell.Integrate(...), Cell.DofNumbering(...)
template <typename TBegin, typename TEnd, typename TFunction>
GlobalMatrix Assemble(TBegin begin, TEnd end, TFunction f, /*dof info, dof types...*/)
{
MatrixAssembler m;
for (;begin != end; ++begin)
m.Add(f(*begin));
return m.Get();
}
Or, for simplicity, be specialized for certain iterator types and TFunction.
For point 2) we can even think of the old NuTo way of moving the assembly to the global types like
GlobalMatrix::Add(DofMatrix<double>& elemOutput, DofMatrix<int>& dofNumbers);
GlobalVector::Add(DofVector<double>& elemOutput, DofMatrix<int>& dofNumbers);
GlobalVector::AddLumped(DofMatrix<double>& elemOutput, DofMatrix<int>& dofNumbers);
Or we put that into classes like MatrixAssembler/VectorAssembler/LumpedAssembler (as in the Assemble(..) code above) . This may make reusing of exising sparsity patterns easier.
We can split that into two, but I'm not sure how much we gain by that. Specifically, to the points in your post above:
- the
DofGeneratorcan not be a temporary - it can't be reused
still hold. So while I think splitting is fine, that wasn't "the problem", right?
As for 2): Reusing a sparsity pattern in the assembler seems a good idea. Making the vectors/matrices assemble themselves - I personally don't like that; should not be their concern imho.
I was thinking about the assembler that really assembles vectors/matrices. And I would like to implement that as soon as the RemoveJ/K team #164 finished. I am especially interested in reusing the sparsity patterns.
Basic interface
MatrixAssembler::MatrixAssembler(DofInfo = {});
void MatrixAssembler::Resize(DofInfo);
void MatrixAssembler::Add(DofMatrix<double> contribution, DofMatrix<int> numbering);
const DofMatrixSparse& MatrixAssembler::Get() const;
Internally, I would probably use vectors of Eigen::Triplet to efficiently build the matrix. In the Get() method, they would be transferred via setFromTriplets to the real sparse matrices (const attribute questionable). This would cover the current features and is decoupled from CellInterface. We can also think of an LumpedMatrixAssembler or have
void MatrixAssembler::AddLumped(DofVector<double>, DofVector<int>);
Reuse nonzeros
As a new feature, we can try to reuse the sparsity. So once the nonzeros are known - like after the first assembly run - we can assemble directly with coeffRef(), instead of the triplet lists. This is supposed to be slightly faster, more memory efficient and avoids the memory allocations. But I am not sure about the interface.
- My idea is to have a
bool mFinishedflag together with avoid Finalize/Finish/Complete/MakeCompressed()method. Naming help needed. So we expect the following flow:
MatrixAssembler assembler(); // mFinished = false;
assembler.Resize(dofInfo);
// first evaluation
for (auto cell : cells)
assembler.Add(cell.Integrate(f), cell.DofNumbering()); // builds into triplets since mFinished == false;
// DoStuff(assembler.Get()); throws, since mFinished == false
// Or Get() internally calls Finish() automatically...
assembler.Finish(); // mFinished = true, converts triplets to matrix, deallocates triplets.
DoStuff(assembler.Get()); // fine now
// second/third/... evaluation
assembler.SetZero();
for (auto cell : cells)
assembler.Add(cell.Integrate(f), cell.DofNumbering()); // builds directly into matrix since mFinished == true;
// no need to call assembler.Finish();
DoStuff(assembler.Get());
- Another idea is to allocate nonzeros before using
for (auto cell : cells)
assembler.AllocateNonzeros(cell.DofNumbering()); // into triplets;
- Or we introduce a method that copies the nonzeros, e.g. an overload
MatrixAssembler::Resize(DofMatrixSparse other);
- Or we even introduce two separate assemblers
DirectAssember assember(tripletListAssember.Get());
- Or, since the
coeffRefapproach does not require members, we can provide free functions for that.
auto hessian = tripletAssembler.Get();
//...
hessian.SetZero();
for (auto cell : cells)
Assemble(hessian, cell.Integrate(f), cell.DofNumbering());
But I would prefer the first (automatic using mFinished) or the last (no mFinished, assembler class == triplet lists, free function == coeffRef) solution.
Parallel assembly
The challenge is that the loop over the cells is now done outside of the assembler. Once the nonzeros are known, there is no reallocation involved. A simple mesh coloring (= MaximumIndependentSets) eliminates race conditions on parallel writes into the matrix.
The TripletList assembly is not that trivial. We could allocate into numThreads separate assemblers and add up the matrices. Or we concatenate the separate triplet lists. Or we have a separate ParallelAssembler that deals with that.
Or we postpone that.
Discussion
- Do you like the proposed interface?
- Naming intuitive enough?
- When/How to deal with parallel assembly?