nalu-wind icon indicating copy to clipboard operation
nalu-wind copied to clipboard

Factor out matrix graph construction from LinearSystem class

Open jhux2 opened this issue 5 years ago • 63 comments

The various linear system construction phases each create a matrix graph. This is redundant work, as almost all phases use the same connectivity graph. One exception is the "monolithic" momentum linear system.

This issue will track progress in factoring out the graph construction so that it is done as few times as possible per time step.

jhux2 avatar Dec 02 '19 21:12 jhux2

@sayerhs @lucbv @srajama1 @alanw0

jhux2 avatar Dec 02 '19 21:12 jhux2

Here a sample trace of the graph initialization calls from ablNeutralEdgeSegregated:

[EDIT: I forgot that the TpetraSegregatedLinearSystem class (TSLS) called through to different underlying graph methods. I've updated the call stack -- differences still remain.]

EquationSystems::initialize(): Begin
EQUATION LowMachEOSWrap initialize()
EQUATION MomentumEQS initialize()
  TSLS::buildFaceElemToNodeGraph
    CrsGraph::buildFaceElemToNodeGraph
  TSLS::buildFaceToNodeGraph
    CrsGraph::buildConnectedNodeGraph
  TSLS::buildEdgeToNodeGraph
    CrsGraph::buildConnectedNodeGraph
  TSLS::buildFacetoNodeGraph
    CrsGraph::buildNodeGraph
  TSLS::buildFacetoNodeGraph
    CrsGraph::buildNodeGraph
EQUATION ContinuityEQS initialize()
  CrsGraph::buildEdgeToNodeGraph
  CrsGraph::buildConnectedNodeGraph
EQUATION EnthalpyEQS initialize()
  CrsGraph::buildFaceToNodeGraph
  CrsGraph::buildConnectedNodeGraph
  CrsGraph::buildFaceToNodeGraph
  CrsGraph::buildConnectedNodeGraph
  CrsGraph::buildEdgeToNodeGraph
  CrsGraph::buildConnectedNodeGraph
  CrsGraph::buildNodeGraph
  CrsGraph::buildNodeGraph
EQUATION TurbKineticEnergyEQS initialize()
  CrsGraph::buildEdgeToNodeGraph
  CrsGraph::buildConnectedNodeGraph
  CrsGraph::buildNodeGraph
EquationSystems::initialize(): End

Working under the assumption that the segregated momentum and continuity have the same matrix graph structure, I was expecting that the Momentum and Continuity phases should have the same call stack.

@sayerhs @alanw0 Is this expected behavior?

jhux2 avatar Dec 12 '19 01:12 jhux2

@jhux2 buildConnectedNodeGraph is the actual function that is called by specific graph building methods

https://github.com/Exawind/nalu-wind/blob/b130d75819829c1cc70a7f2b02404919007ee4c8/src/TpetraSegregatedLinearSystem.C#L382-L386

In the ABL problem we do have different BCs applied to different equation systems, that is why the graph calls look different. However, if you look at the final resulting graph I believe they will have the same non-zero columns for each row.

sayerhs avatar Dec 12 '19 20:12 sayerhs

@sayerhs Apart from buildConnectedNodeGraph, are all the other calls for handling boundary conditions?

In general, is there a way of determining during runtime which physics systems will have the same graph structure, or is this something that can only be known by someone who knows the physics of the simulation?

jhux2 avatar Dec 12 '19 20:12 jhux2

Status update

I fixed what ended up being a simple bug in TpetraSegregatedLinearSystem that was causing seg faults and/or bad convergence. Running ablNeutralEdgeSegregated in two different ways yields:

use_edges: no: continuity and scalar momentum matrix graphs are identical use_edges:yes: scalar momentum matrix graph has more entries than continuity graph

Afaik, figuring out why the graphs are different for the yes case and fixing that is the last hurdle to being able to use a single graph.

jhux2 avatar Dec 18 '19 17:12 jhux2

It looks like the momentum wall function implementation is using the full face element connectivity, more than what continuity uses at the wall. We could maybe pad the connectivity around the boundaries to always have the full face connectivity, or we could drop connections for the wall function

rcknaus avatar Dec 18 '19 17:12 rcknaus

@rcknaus thanks for the explanation. Would it be possible to run all the boundary condition algorithms (for momentum, continuity, etc...) on the "master graph" so that we have all connectivity in place. We could then use the master graph for all the matrices in the system and simply put default zeroes in entries that are not needed say in the continuity matrix in the case of ablNeutralEdge?

lucbv avatar Dec 18 '19 17:12 lucbv

Yeah, I think that would be a good solution.

rcknaus avatar Dec 18 '19 17:12 rcknaus

@lucbv If we revert to a single graph, won't all initialize_connectivity go through the same graph creation logic and end up the way you describe? @jhux2 had expressed some performance concerns with this approach. I suspect it is not that much but good to test that out.

sayerhs avatar Dec 18 '19 19:12 sayerhs

@lucbv wrote:

Would it be possible to run all the boundary condition algorithms (for momentum, continuity, etc...) on the "master graph" so that we have all connectivity in place. We could then use the master graph for all the matrices in the system and simply put default zeroes in entries that are not needed say in the continuity matrix in the case of ablNeutralEdge?

@sayerhs suggested something along these lines.

jhux2 avatar Dec 18 '19 19:12 jhux2

It looks like the momentum wall function implementation is using the full face element connectivity, more than what continuity uses at the wall. We could maybe pad the connectivity around the boundaries to always have the full face connectivity, or we could drop connections for the wall function

@rcknaus I looked in LowMachEquationSystem.C to see if there was an obvious way of changing which graph is created, but I don't understand the logic well enough. I'm guessing that initialize_connectivity() needs to call a different graph setup, but what else has to happen?

jhux2 avatar Dec 18 '19 20:12 jhux2

@jhux2 initialize_connectivity is called in https://github.com/Exawind/nalu-wind/blob/73cb4b961434f552b52ccd037281406ca0e247ec/src/SolverAlgorithmDriver.C#L70

which then accesses linsys_ for the corresponding equation system https://github.com/Exawind/nalu-wind/blob/master/src/LowMachEquationSystem.C#L1016

and then calls the build*Graph methods. The first step would be to separate out graph from linear system (or have the ability for LinearSystem::create to take a graph argument and use that instead of creating a graph inside. If LinearSystem can be made to have the same graph instance, then the equation systems are oblivious to it.

For the ABL neutral edge regression tests, based on the physics all I can say is that the momentum graph is a superset of pressure graph.

sayerhs avatar Dec 18 '19 21:12 sayerhs

@sayerhs Actually, I've already factored out the crsgraph from LinearSystem. See my branch https://github.com/jhux2/nalu-wind/tree/crsgraph-refactor.

My question is what needs to change in addition to creating a graph with more connectivity for the physics phases other than momentum.

jhux2 avatar Dec 18 '19 21:12 jhux2

Would it be helpful to create a PR at this point?

jhux2 avatar Dec 18 '19 21:12 jhux2

@jhux2 I am not sure I understand your question. If every linsys_ instance has the same graph instance, then once all the equation systems call initialize() method, the resultant graph will have the connectivity of all the physics involved across all equation systems. There is nothing else that needs to be done to the rest of the code.

Are you asking about the overhead of build*Graph calls that are repeated in each equation system that will add duplicate connections? If so, that is a bit tricky because of several combinations of if-conditions that nalu supports currently. But for certain conditions (e.g., use_edges: True) we only need to do buildEdgeToNodeGraph and buildElemToNodeGraph etc once as they are on the same mesh parts. The only ones we will need to repeat over all equation systems is the boundary conditions, but since they are faces they should be of smaller impact at least for the first proof of concept.

I think a PR would be useful to look at your code and discuss.

sayerhs avatar Dec 18 '19 21:12 sayerhs

I am not sure I understand your question. If every linsys_ instance has the same graph instance, then once all the equation systems call initialize() method, the resultant graph will have the connectivity of all the physics involved across all equation systems. There is nothing else that needs to be done to the rest of the code.

The current state of my PR is that there is now a separate CrsGraph class. In each physics phase, the linear system construction first creates a CrsGraph, then creates a matrix.

What I've observed is that the graph sparsity varies for the segregated momentum, depending on whether use_edges is true or false.

Before having all physics phases depend on a single CrsGraph (which is the next logical step), I'd like to get the code in a state where the segregated momentum graph has the same sparsity as continuity, regardless of the value of use_edges. My question is whether this is just a matter of calling the correct method CrsGraph::build*Graph, or if there's something with BCs that needs to change.

jhux2 avatar Dec 18 '19 22:12 jhux2

@jhux2 OK. Would it not be possible by simply changing the boundary conditions in the input file to be dirichlet instead of wall function? Can you explain your motivation for changing the code?

If you want to do change the code, then you can capture the parts in register_*_bc calls within the equation systems and then manually call linsys_->buildFaceToNodeGraph on those part vectors.

Finally, when use_edges: yes, the current nalu-wind implementation adds extra connections to the graph even though it is not used at all. Those entries just end up as zeros in momentum system.

sayerhs avatar Dec 18 '19 22:12 sayerhs

@sayerhs My goal is just to get the momentum graph to be the same, regardless of use_edges. If this can be done via the input deck and without changing code, then I'm a happy guy.

jhux2 avatar Dec 18 '19 22:12 jhux2

Confirmed that graphs match for the following when activating the Tpetra segregated momentum solve:

periodic3dEdge

-rw-rw-r-- 1 jhu jhu 798902 Dec 18 17:52 ContinuityEQS-O-0.gra.1.0
-rw-rw-r-- 1 jhu jhu 798902 Dec 18 17:52 ContinuityEQS-O-1.gra.1.0
-rw-rw-r-- 1 jhu jhu 798902 Dec 18 17:52 EnthalpyEQS-O-0.gra.1.0
-rw-rw-r-- 1 jhu jhu 798902 Dec 18 17:52 EnthalpyEQS-O-1.gra.1.0
-rw-rw-r-- 1 jhu jhu 798902 Dec 18 17:52 MomentumEQS-O-0.gra.1.0
-rw-rw-r-- 1 jhu jhu 798902 Dec 18 17:52 MomentumEQS-O-1.gra.1.0

periodic3dElem

-rw-rw-r-- 1 jhu jhu 1576766 Dec 18 17:40 ContinuityEQS-O-0.gra.1.0
-rw-rw-r-- 1 jhu jhu 1576766 Dec 18 17:40 ContinuityEQS-O-1.gra.1.0
-rw-rw-r-- 1 jhu jhu 1576766 Dec 18 17:40 EnthalpyEQS-O-0.gra.1.0
-rw-rw-r-- 1 jhu jhu 1576766 Dec 18 17:40 EnthalpyEQS-O-1.gra.1.0
-rw-rw-r-- 1 jhu jhu 1576766 Dec 18 17:40 MomentumEQS-O-0.gra.1.0
-rw-rw-r-- 1 jhu jhu 1576766 Dec 18 17:40 MomentumEQS-O-1.gra.1.0

jhux2 avatar Dec 19 '19 00:12 jhux2

Graphs also match between different phyics for the airfoilRANSEdge and airfoilRANSElem, respectively, when using the segregated momentum solver.

jhux2 avatar Dec 19 '19 01:12 jhux2

@jhux2 For testing purposes, one solution would be https://github.com/rcknaus/nalu-wind/commit/fb984a8fb81472c2c0ef8578ee6be8a5733f3718 , where I just force all boundary conditions to use the same stencil regardless of whether it's necessary or not.

Which graph methods are called in initialize_connectivity() depends on the type of SolverAlgorithms instantiated in the register_*_algorithm sections. For the abl neutral case, there's a "face elem" algorithm (buildFaceElemToNodeGraph) at the "top" bc and a "face" algorithm (buildFaceToNodeGraph) at the bottom. For continuity, it's a no-op for both of those condtions.

rcknaus avatar Dec 19 '19 07:12 rcknaus

Would it be possible to run all the boundary condition algorithms (for momentum, continuity, etc...) on the "master graph" so that we have all connectivity in place. We could then use the master graph for all the matrices in the system and simply put default zeroes in entries that are not needed say in the continuity matrix in the case of ablNeutralEdge?

I've returned to working on this. It appears that we'll need something like @lucbv suggested. Different physics are creating different stencils. For example, below is the ablNeutralEdge regression test, where one can see different graph calls for the different physics, and reusing the graph between continuity and enthalpy fails on 8 MPI ranks.

Question: Would it be reasonable to do what @rcknaus said he does for testing, i.e., force all BCs to use the same stencil?

EQUATION MomentumEQS
CrsGraph::buildFaceElemToNodeGraph
CrsGraph::buildFaceToNodeGraph
CrsGraph::buildConnectedNodeGraph
CrsGraph::buildEdgeToNodeGraph
CrsGraph::buildConnectedNodeGraph
CrsGraph::buildNodeGraph
CrsGraph::buildNodeGraph
EQUATION ContinuityEQS
CrsGraph::buildEdgeToNodeGraph
CrsGraph::buildConnectedNodeGraph
EQUATION EnthalpyEQS
CrsGraph::buildFaceToNodeGraph
CrsGraph::buildConnectedNodeGraph
CrsGraph::buildFaceToNodeGraph
CrsGraph::buildConnectedNodeGraph
CrsGraph::buildEdgeToNodeGraph
CrsGraph::buildConnectedNodeGraph
CrsGraph::buildNodeGraph
CrsGraph::buildNodeGraph
EQUATION TurbKineticEnergyEQS
CrsGraph::buildEdgeToNodeGraph
CrsGraph::buildConnectedNodeGraph
CrsGraph::buildNodeGraph
TLS::finalizeLinearSystem
EquationSystems::initialize(): End

jhux2 avatar Jan 29 '20 19:01 jhux2

@jhux2 if you call buildElemToNodeGraph for interior and buildFaceElemToNodeGraph for boundaries, all these should collapse into the same graph.

However, can you explain what you mean by continuity/enthalpy failing? Do you mean you only run the build* methods once for Continuity and then reuse that for Enthalpy without letting Enthalpy call the build* methods too?

sayerhs avatar Jan 29 '20 19:01 sayerhs

@sayerhs would it then make sense to only use these two graphs and remove the others? It would simplify the code base and probably would not cost much more computationally.

Jonathan is trying to build the CrsGraph once per time step, the different equations in the system would then share the same graph

lucbv avatar Jan 29 '20 19:01 lucbv

However, can you explain what you mean by continuity/enthalpy failing? Do you mean you only run the build* methods once for Continuity and then reuse that for Enthalpy without letting Enthalpy call the build* methods too?

If I allow Continuity and Enthalpy to each build their own CrsGraph, the test passes. If Enthalpy uses the graph that Continuity built, the test fails.

jhux2 avatar Jan 29 '20 19:01 jhux2

@jhux2 Thanks, I think the solution would then be to use buildElemToNodeGraph and buildFaceElemToNodeGraph for all use cases and then we will have the full connectivity for all the different possible BCs across all equation systems. I think this is along the lines of what @rcknaus suggested in a previous post.

The question for you and @lucbv is: what happens if the stencil for a row has 27 entries but only 8 are non-zero but the rest are all zeros. Would there be a performance impact? I suspect the savings from graph reinit will more than make up for this, but thought I'd ask.

sayerhs avatar Jan 29 '20 19:01 sayerhs

@sayerhs usually I am happy to trade a little more flop if I can save some communication. In this case the overall increase of entries in the matrix is modest enough at large scale that it would not be very noticeable. On smaller meshes where the extrior/interior ratio gets high you could argue that less entries is better, but that's not the regime we are targeting.

lucbv avatar Jan 29 '20 19:01 lucbv

By the way if you call buildElemToNodeGraph and buildFaceElemToNodeGraph you do end up with the extra zero entries anyway right?

lucbv avatar Jan 29 '20 19:01 lucbv

Thanks, I think the solution would then be to use buildElemToNodeGraph and buildFaceElemToNodeGraph for all use cases and then we will have the full connectivity for all the different possible BCs across all equation systems. I think this is along the lines of what @rcknaus suggested in a previous post.

@sayerhs Thank you. Would it be sufficient to use @rcknaus's patch, https://github.com/rcknaus/nalu-wind/commit/fb984a8fb81472c2c0ef8578ee6be8a5733f3718?

jhux2 avatar Jan 29 '20 19:01 jhux2

@lucbv Yeah, that's why I was asking about the potential performance impacts. If we replace buildEdgeToNodeGraph with buildElemToNodeGraph that will increase the stencil for an internal DOF from 7 to 27 but most of those will be zeros when using edge algorithms. Similarly, FaceElemToNode will add connectivity to internal nodes when processing boundary faces which are not used for several of the BCs.

That patch is a quick way to test this, but think it would be cleaner to replace the implementation so buildEdgeToNodeGraph etc. in the CrsGraph class. It doesn't make sense to loop over the mesh and add those other connections via SolverAlgorithm::initialize_connectivity() if we are going to call buildElemToNodeGraph anyway.

sayerhs avatar Jan 29 '20 20:01 sayerhs