opm-simulators New defaults

I think it is time to change some defaults in Flow related to accuracy and performance. Motivation:

$10^{-6}$ mb tol sometimes leads to too large accumulated mass balance errors
drift compensation sometimes leads to significant convergence issues (and are not needed if mb = $10^{-7}$)
ilu0 leads to tool many iterations for mb $10^{-7}$
for cases where we force small timesteps (for instance using TUNING) 1 newton iteration is often sufficient)

I have run with these parameters for many cases for a long time.

This will need extensive testing before it can be merged.

Feb 02 '24 15:02 totto82

jenkins build this please

Feb 02 '24 15:02 totto82

not sure about the best way to do this. Do you think it might be easier to get things in if you split them among multiple PRs? The combined effects can be tricky to test.

Feb 02 '24 15:02 GitPaean

not sure about the best way to do this. Do you think it might be easier to get things in if you split them among multiple PRs? The combined effects can be tricky to test.

I see your point, but

I would like a common discussion about the new defaults
They are all interlinked (the fourth one could be taken independently)
They all effect results, and it will be a large effort to check all the failed tests.

Feb 05 '24 07:02 totto82

benchmark please

Feb 05 '24 07:02 totto82

From our end, this is a highly welcome initiative. We will probably need some time to do proper testing as this will impact all models we run.

Feb 05 '24 08:02 alfbr

for cases where we force small timesteps (for instance using TUNING) 1 newton iteration is often sufficient)

The change to the minimum newton iterations created some strange failure. As suggested by @GitPaean I will split that out to another PR.

There are 6 significant test failures that needs to be addressed.

MICP. Seg-faults with CPR
2D_POLYMER_INJECTIVITY also have problems with CPR 3-4) SPE1CASE1_BRINE and SPE1CASE1_PRECSALT starts oscillating takes forever to complete with mb=1e-7.
5-6) NETWORK-01-RESTART and NETWORK-01-REROUTE-RESTART related to restart.

The rest is either minor or needs more careful examination of the results.

Feb 06 '24 08:02 totto82

The MICP case seg-faults in extractCPRPressureMatrix with CPRW (works for the other variants) The MICP model is a one-phase (water) model with 5 components.

Feb 06 '24 11:02 totto82

benchmark please

Feb 06 '24 20:02 alfbr

SPE1CASE1_BRINE and SPE1CASE1_PRECSALT starts oscillating takes forever to complete with mb=1e-7.

For this case the simplest solution is to introduce a relaxed_tol_mb (ala TUNING 2-7).

Feb 07 '24 14:02 totto82

Posting missing ytelses results:

Test	Configuration	Relative
opm-git	OPM Benchmark: drogon - Threads: 1	0.998
opm-git	OPM Benchmark: drogon - Threads: 8	0.999
opm-git	OPM Benchmark: punqs3 - Threads: 1	0.982
opm-git	OPM Benchmark: punqs3 - Threads: 8	1.006
opm-git	OPM Benchmark: smeaheia - Threads: 1	0.963
opm-git	OPM Benchmark: smeaheia - Threads: 8	1
opm-git	OPM Benchmark: spe10_model_1 - Threads: 1	1.003
opm-git	OPM Benchmark: spe10_model_1 - Threads: 8	1.003
opm-git	OPM Benchmark: flow_mpi_extra - Threads: 1 - FOIT (Total Oil Injection At End Of Run)	1
opm-git	OPM Benchmark: flow_mpi_extra - Threads: 8 - FOIT (Total Oil Injection At End Of Run)	1
opm-git	OPM Benchmark: flow_mpi_norne - Threads: 1	0.995
opm-git	OPM Benchmark: flow_mpi_norne - Threads: 8	1.009
opm-git	OPM Benchmark: flow_mpi_norne_4c_msw - Threads: 1	1.006
opm-git	OPM Benchmark: flow_mpi_norne_4c_msw - Threads: 8	0.999

Speed-up = Total time master / Total time pull request. Above 1.0 is an improvement. *

Feb 12 '24 11:02 blattms

jenkins build this please

Feb 20 '24 14:02 totto82

jenkins build this please

Mar 05 '24 11:03 totto82

jenkins build this please

Mar 05 '24 11:03 akva2

jenkins build this please

May 15 '24 11:05 totto82

jenkins build this please

May 24 '24 08:05 totto82

jenkins build this please

May 24 '24 09:05 GitPaean

maybe we should incorporate https://github.com/OPM/opm-simulators/pull/5280 to this PR. @totto82

May 24 '24 09:05 GitPaean

benchmark please

May 24 '24 12:05 totto82

Making a data update here, not intending to merge that without analysis, but this is to find test failures that are not "fixable" by new reference data, such as serial vs parallel comparison.

May 27 '24 09:05 atgeirr

jenkins build this update_data please

May 27 '24 09:05 atgeirr

Making a data update here, not intending to merge that without analysis, but this is to find test failures that are not "fixable" by new reference data, such as serial vs parallel comparison.

Okay, but this is a fairly expensive approach given that this work affects more or less every existing regression test. For what it's worth, the new candidate solutions generated roughly 1.9 GB of binary files. Please hold additional attempts until other in-flight updates, e.g., #5232, have been merged.

May 27 '24 15:05 bska

Benchmark result overview:

Test	Configuration	Relative
opm-git	OPM Benchmark: drogon - Threads: 1	1.004
opm-git	OPM Benchmark: drogon - Threads: 8	0.819
opm-git	OPM Benchmark: punqs3 - Threads: 1	1.013
opm-git	OPM Benchmark: punqs3 - Threads: 8	0.961
opm-git	OPM Benchmark: smeaheia - Threads: 1	1.003
opm-git	OPM Benchmark: smeaheia - Threads: 8	0.876
opm-git	OPM Benchmark: spe10_model_1 - Threads: 1	0.991
opm-git	OPM Benchmark: spe10_model_1 - Threads: 8	0.993
opm-git	OPM Benchmark: flow_mpi_extra - Threads: 1	0.999
opm-git	OPM Benchmark: flow_mpi_extra - Threads: 8	0.897
opm-git	OPM Benchmark: flow_mpi_norne - Threads: 1	1.008
opm-git	OPM Benchmark: flow_mpi_norne - Threads: 8	0.952
opm-git	OPM Benchmark: flow_mpi_norne_4c_msw - Threads: 1	0.99
opm-git	OPM Benchmark: flow_mpi_norne_4c_msw - Threads: 8	0.946

Speed-up = Total time master / Total time pull request. Above 1.0 is an improvement. *

View result details @ https://www.ytelses.com/opm/?page=result&id=2486

May 27 '24 15:05 ytelses

jenkins build this opm-tests=5157 please

May 28 '24 06:05 totto82

jenkins build this opm-tests=5157 please

OPM/opm-tests#5157 does not exist and https://github.com/OPM/opm-tests/pull/1165 can not be used due to merging conflicts. You probably should update_data again to get a reference update PR.

May 28 '24 10:05 GitPaean

You probably should update_data again to get a reference update PR.

Maybe. If we're "just" trying to determine which case will not be cured by updating the reference solutions, then we can get a lot of information from the test failures of the latest jenkins run. Regression test cases named mpi.compareECLFiles_flow+<CASE> are those that would typically be fine when we update the reference solutions. The others are

mpi.compareDamarisFiles_flow+SPE1CASE1
mpi.compareParallelSim_flow+AQUFLUX-01
mpi.compareParallelSim_flow+SPE1CASE2_SOLVENT
mpi.compareRestartedSim_flow+NETWORK-01-REROUTE-RESTART
mpi.compareRestartedSim_flow+NETWORK-01-RESTART
mpi.python_fluidstate_variables
mpi.python_primary_variables

The Damaris case may be okay as well if we update the reference solutions. Finally, there is the case named mpi.compareECLFiles_flow_blackoil_polyhedralgrid+SPE1CASE2 which I don't know enough about without looking into the details.

May 28 '24 11:05 bska

I've started to run the tests for seperate parts of the PR in attempt to get a better overview. I don't know whether the --local-well-solve-control-switching-option has ever been run on anything other than flow_blackoil, and I see for instance the potential calculations for this option does not handle solvent. Should be simple fix, but will investigate a bit further to check for possible other implications.

May 30 '24 09:05 steink

I'm working on this and have gone through all failures that --local-well-solve-control-switching=true are contributing to. This was a larger number (175) than anticipated and mainly due to differences in potential calculations. Even when potientals aren't used or written to summary, they are written to restart, so small differences are captured there.

With current default, the potentials for standard wells are calculated by local well solve followed by an explicit update using updated pressure in the following lines: https://github.com/OPM/opm-simulators/blob/cc6acb77221c729570bbc64f9de08b98f3ce0bdd/opm/simulators/wells/StandardWell_impl.hpp#L1554-L1557 I understand the intention here (ameliorate errors due to explicit treatment of pressure-drop along well), but I did not include this second round of updates in computeWellPotentialsImplicit because I was uncertain on the behaviour of this fix when equations are non-linear (e.g., cross-flow). It could easily be included, and would lead to fewer failures, but I'm not sure it's neccessary. Any opinions on this @GitPaean , @totto82 or others?

Otherwise, there is a large amount (~50) of test failures due to different time-stepping (no clear advantage just slightly different). Finally, there are 5 tests where --local-well-solve-control-switching=true seem to worsen the result. Will study these a bit more in detail.

Jun 13 '24 10:06 steink

I understand the intention here (ameliorate errors due to explicit treatment of pressure-drop along well), but I did not include this second round of updates in computeWellPotentialsImplicit because I was uncertain on the behaviour of this fix when equations are non-linear (e.g., cross-flow). It could easily be included, and would lead to fewer failures, but I'm not sure it's neccessary. Any opinions on this @GitPaean , @totto82 or others?

For this part, how big is the difference of the well potential values. If the difference is only minor, I think you can decide. If the difference is more significant, maybe we should have a look.

Jun 14 '24 12:06 GitPaean

I understand the intention here (ameliorate errors due to explicit treatment of pressure-drop along well), but I did not include this second round of updates in computeWellPotentialsImplicit because I was uncertain on the behaviour of this fix when equations are non-linear (e.g., cross-flow).

If I remember correctly this improved the results on some particular wells on some models. But this was long time ago, and I dont remember the model. Maybe it was Norne or SPE9? No strong opinions, as the difference should be small anyway.

Jun 14 '24 13:06 totto82

@GitPaean @totto82 thanks! From what I can see, differences are small compared to e.g., changing the linear solver ;). At some point we discussed having a seperate well_state for the potentials which would avoid the copying and provide better initial guesses for the calculations (this would also give explicit p-drop calculations at relevant pressure), but this is for later.

Jun 14 '24 14:06 steink