opm-simulators Ghost entries skipped for ILU apply and SpMV operator in all levels of AMG/CPR hierarchy.

PR contains two changes:

Skip ghost rows in ILU and SpMV operator inside the AMG and CPR smoothers. This is done by exploiting the parallel index-set in the ILU class and new operator class. The changes should reduce execution time of parallel AMG/CPR simulations.
Allow for use of AMG with matrix-add-well-contributions=false. AMG works quite well on the Sleipner-CO2 case.

Nov 29 '22 11:11 andrthu

jenkins build this please

Nov 30 '22 08:11 alfbr

Sounds interesting! Not investigated it yet, do you have some ballpark measurements?

Dec 01 '22 07:12 atgeirr

Working on it :-)

Dec 01 '22 08:12 alfbr

Moving this to https://github.com/lisajulia/opm-simulators/tree/ilu-op-in-amg

Feb 02 '24 16:02 lisajulia

@alfbr: I took a look at this PR and I was wondering if, at that time, you have done any measurements regarding the execution time reduction? Thanks! :)

Feb 06 '24 07:02 lisajulia

@alfbr: I took a look at this PR and I was wondering if, at that time, you have done any measurements regarding the execution time reduction? Thanks! :)

Yes, I did several measurements. The impact will of course depend on how many ghost cells there are compared to interior cells. While I do not remember exact figures, I believe even on 8-16 processes on the open Norne model, you can get around 10% improvement on total execution time. Two caveats though, the assembly code has undergone optimizations since then, and I do not know how the improvement will translate to CPR. Still, this is maybe the most low hanging fruit when it comes to improving scaling of CPR+AMG.

Feb 06 '24 08:02 alfbr

@

@alfbr: I took a look at this PR and I was wondering if, at that time, you have done any measurements regarding the execution time reduction? Thanks! :)

Yes, I did several measurements. The impact will of course depend on how many ghost cells there are compared to interior cells. While I do not remember exact figures, I believe even on 8-16 processes on the open Norne model, you can get around 10% improvement on total execution time. Two caveats though, the assembly code has undergone optimizations since then, and I do not know how the improvement will translate to CPR. Still, this is maybe the most low hanging fruit when it comes to improving scaling of CPR+AMG.

Thanks a lot! I also quickly compared the current master and the current master with this this change using the NORNE model using --linear-solver=cpr. Is that the setting you were using as well, right? For me, on 10 processes, the current master was ~3% slower.

Can you maybe let me know in which case you achieved a 10% improvement, then I could also recheck this with the current master? That will probably help in deciding on how to proceed with this PR.

Feb 07 '24 15:02 lisajulia

Thanks a lot! I also quickly compared the current master and the current master with this this change using the NORNE model using --linear-solver=cpr. Is that the setting you were using as well, right? For me, on 10 processes, the current master was ~3% slower.

A 3% improvement on ten processes is a nice boost, very happy if we can see that materialize in master :) With higher process count, I expect numbers to improve.

Can you maybe let me know in which case you achieved a 10% improvement, then I could also recheck this with the current master? That will probably help in deciding on how to proceed with this PR.

I am afraid I am not able to reproduce. Those results were with the ILU implementation, and the assembly part has been optimized significantly since then. It was a different code base back then, as this is now years ago. Hence, any numbers will need to be rerun against current master for realistic numbers, just as you have done.

Feb 09 '24 10:02 alfbr