opm-models Avoid assembling ghost row off-diagonals.

The diagonal element will still be assembled. This reduces the number of non-zero blocks in the matrix in parallel.

Aug 14 '20 08:08 atgeirr

jenkins build this please

Aug 14 '20 08:08 atgeirr

IMHO, these kind of optimizations are pretty much on the edge (are a bit over it) of the cost/benefit ratio. We are trying to quench out very little performance gains at the cost of generality and maintainability. Those made in flow are at least contained in the application space (but they did and are costing a lot of hours...), but here we are in a library and there needs to be a very clear benefit to justify increasing the maintenance burden or even loosing generality (actually the latter should never happen).

Aug 14 '20 09:08 blattms

here we are in a library and there needs to be a very clear benefit to justify increasing the maintenance burden or even loosing generality (actually the latter should never happen).

I see your point. However, should not ghost row entries be eliminated like this in general? Even if the parallel linear operators can deal with it, they do not need it and gain from eliminating. And the preconditioners require us to eliminate their influence one way or another. So should we not rather try to generalize this, and get things to scale better with the other discretization?

What I said above I think applies to ISTL at least. Perhaps this issue depends on what framework you are using, and that for PETSc for example a different way of thinking is needed or correct. That is in any case beyond my expertise at the moment.

Aug 14 '20 10:08 atgeirr

I misread the code (did not read that the stencil is column major and in my head I stayed in row-major mode). Yes, you are right that the code is correct for cell-centered finite volumes and one level of overlap. For all other cases it is not.

The problem is that in general the handling of the processor boundaries depends on the discretization type and parallelization approach. There might also be preconditioners that depend on a symmetric sparsity pattern for performance/efficiency reasons.

Aug 17 '20 06:08 blattms

The problem is that in general the handling of the processor boundaries depends on the discretization type and parallelization approach. There might also be preconditioners that depend on a symmetric sparsity pattern for performance/efficiency reasons.

That indicates to me that the proper way of dealing with this is to have the generic code be "safe" (and not change it), while allowing a specific code (for a specific discretization) to override the behaviour. If I continue this work, I'll try to move it to a "ecfv" context.

Thanks for pointing out that preconditioners might have their own requirements, do you know of any (currently implemented in istl or not) examples where this becomes an issue? In particular, I hope it is not an issue with AMG?

Aug 17 '20 06:08 atgeirr

Again: It depends on the discretization and the parallelization approach. Currently it will work with flow, but if you change the latter (and there have been discussions about that) it will not.

Aug 17 '20 07:08 blattms