Ghost entries skipped for ILU apply and SpMV operator in all levels of AMG/CPR hierarchy.
PR contains two changes:
-
Skip ghost rows in ILU and SpMV operator inside the AMG and CPR smoothers. This is done by exploiting the parallel index-set in the ILU class and new operator class. The changes should reduce execution time of parallel AMG/CPR simulations.
-
Allow for use of AMG with matrix-add-well-contributions=false. AMG works quite well on the Sleipner-CO2 case.
jenkins build this please
Sounds interesting! Not investigated it yet, do you have some ballpark measurements?
Working on it :-)
Moving this to https://github.com/lisajulia/opm-simulators/tree/ilu-op-in-amg
@alfbr: I took a look at this PR and I was wondering if, at that time, you have done any measurements regarding the execution time reduction? Thanks! :)
@alfbr: I took a look at this PR and I was wondering if, at that time, you have done any measurements regarding the execution time reduction? Thanks! :)
Yes, I did several measurements. The impact will of course depend on how many ghost cells there are compared to interior cells. While I do not remember exact figures, I believe even on 8-16 processes on the open Norne model, you can get around 10% improvement on total execution time. Two caveats though, the assembly code has undergone optimizations since then, and I do not know how the improvement will translate to CPR. Still, this is maybe the most low hanging fruit when it comes to improving scaling of CPR+AMG.
@
@alfbr: I took a look at this PR and I was wondering if, at that time, you have done any measurements regarding the execution time reduction? Thanks! :)
Yes, I did several measurements. The impact will of course depend on how many ghost cells there are compared to interior cells. While I do not remember exact figures, I believe even on 8-16 processes on the open Norne model, you can get around 10% improvement on total execution time. Two caveats though, the assembly code has undergone optimizations since then, and I do not know how the improvement will translate to CPR. Still, this is maybe the most low hanging fruit when it comes to improving scaling of CPR+AMG.
Thanks a lot! I also quickly compared the current master and the current master with this this change using the NORNE model using --linear-solver=cpr. Is that the setting you were using as well, right? For me, on 10 processes, the current master was ~3% slower.
Can you maybe let me know in which case you achieved a 10% improvement, then I could also recheck this with the current master? That will probably help in deciding on how to proceed with this PR.
Thanks a lot! I also quickly compared the current master and the current master with this this change using the NORNE model using --linear-solver=cpr. Is that the setting you were using as well, right? For me, on 10 processes, the current master was ~3% slower.
A 3% improvement on ten processes is a nice boost, very happy if we can see that materialize in master :) With higher process count, I expect numbers to improve.
Can you maybe let me know in which case you achieved a 10% improvement, then I could also recheck this with the current master? That will probably help in deciding on how to proceed with this PR.
I am afraid I am not able to reproduce. Those results were with the ILU implementation, and the assembly part has been optimized significantly since then. It was a different code base back then, as this is now years ago. Hence, any numbers will need to be rerun against current master for realistic numbers, just as you have done.