opm-simulators icon indicating copy to clipboard operation
opm-simulators copied to clipboard

New defaults

Open totto82 opened this issue 1 year ago • 32 comments

I think it is time to change some defaults in Flow related to accuracy and performance. Motivation:

  • $10^{-6}$ mb tol sometimes leads to too large accumulated mass balance errors
  • drift compensation sometimes leads to significant convergence issues (and are not needed if mb = $10^{-7}$)
  • ilu0 leads to tool many iterations for mb $10^{-7}$
  • for cases where we force small timesteps (for instance using TUNING) 1 newton iteration is often sufficient)

I have run with these parameters for many cases for a long time.

This will need extensive testing before it can be merged.

totto82 avatar Feb 02 '24 15:02 totto82

jenkins build this please

totto82 avatar Feb 02 '24 15:02 totto82

not sure about the best way to do this. Do you think it might be easier to get things in if you split them among multiple PRs? The combined effects can be tricky to test.

GitPaean avatar Feb 02 '24 15:02 GitPaean

not sure about the best way to do this. Do you think it might be easier to get things in if you split them among multiple PRs? The combined effects can be tricky to test.

I see your point, but

  1. I would like a common discussion about the new defaults
  2. They are all interlinked (the fourth one could be taken independently)
  3. They all effect results, and it will be a large effort to check all the failed tests.

totto82 avatar Feb 05 '24 07:02 totto82

benchmark please

totto82 avatar Feb 05 '24 07:02 totto82

From our end, this is a highly welcome initiative. We will probably need some time to do proper testing as this will impact all models we run.

alfbr avatar Feb 05 '24 08:02 alfbr

for cases where we force small timesteps (for instance using TUNING) 1 newton iteration is often sufficient)

The change to the minimum newton iterations created some strange failure. As suggested by @GitPaean I will split that out to another PR.

There are 6 significant test failures that needs to be addressed.

  1. MICP. Seg-faults with CPR
  2. 2D_POLYMER_INJECTIVITY also have problems with CPR 3-4) SPE1CASE1_BRINE and SPE1CASE1_PRECSALT starts oscillating takes forever to complete with mb=1e-7.
    5-6) NETWORK-01-RESTART and NETWORK-01-REROUTE-RESTART related to restart.

The rest is either minor or needs more careful examination of the results.

totto82 avatar Feb 06 '24 08:02 totto82

The MICP case seg-faults in extractCPRPressureMatrix with CPRW (works for the other variants) The MICP model is a one-phase (water) model with 5 components.

totto82 avatar Feb 06 '24 11:02 totto82

benchmark please

alfbr avatar Feb 06 '24 20:02 alfbr

SPE1CASE1_BRINE and SPE1CASE1_PRECSALT starts oscillating takes forever to complete with mb=1e-7.

For this case the simplest solution is to introduce a relaxed_tol_mb (ala TUNING 2-7).

totto82 avatar Feb 07 '24 14:02 totto82

Posting missing ytelses results:

Test Configuration Relative
opm-git OPM Benchmark: drogon - Threads: 1 0.998
opm-git OPM Benchmark: drogon - Threads: 8 0.999
opm-git OPM Benchmark: punqs3 - Threads: 1 0.982
opm-git OPM Benchmark: punqs3 - Threads: 8 1.006
opm-git OPM Benchmark: smeaheia - Threads: 1 0.963
opm-git OPM Benchmark: smeaheia - Threads: 8 1
opm-git OPM Benchmark: spe10_model_1 - Threads: 1 1.003
opm-git OPM Benchmark: spe10_model_1 - Threads: 8 1.003
opm-git OPM Benchmark: flow_mpi_extra - Threads: 1 - FOIT (Total Oil Injection At End Of Run) 1
opm-git OPM Benchmark: flow_mpi_extra - Threads: 8 - FOIT (Total Oil Injection At End Of Run) 1
opm-git OPM Benchmark: flow_mpi_norne - Threads: 1 0.995
opm-git OPM Benchmark: flow_mpi_norne - Threads: 8 1.009
opm-git OPM Benchmark: flow_mpi_norne_4c_msw - Threads: 1 1.006
opm-git OPM Benchmark: flow_mpi_norne_4c_msw - Threads: 8 0.999
  • Speed-up = Total time master / Total time pull request. Above 1.0 is an improvement. *

blattms avatar Feb 12 '24 11:02 blattms

jenkins build this please

totto82 avatar Feb 20 '24 14:02 totto82

jenkins build this please

totto82 avatar Mar 05 '24 11:03 totto82

jenkins build this please

akva2 avatar Mar 05 '24 11:03 akva2

jenkins build this please

totto82 avatar May 15 '24 11:05 totto82

jenkins build this please

totto82 avatar May 24 '24 08:05 totto82

jenkins build this please

GitPaean avatar May 24 '24 09:05 GitPaean

maybe we should incorporate https://github.com/OPM/opm-simulators/pull/5280 to this PR. @totto82

GitPaean avatar May 24 '24 09:05 GitPaean

benchmark please

totto82 avatar May 24 '24 12:05 totto82

Making a data update here, not intending to merge that without analysis, but this is to find test failures that are not "fixable" by new reference data, such as serial vs parallel comparison.

atgeirr avatar May 27 '24 09:05 atgeirr

jenkins build this update_data please

atgeirr avatar May 27 '24 09:05 atgeirr

Making a data update here, not intending to merge that without analysis, but this is to find test failures that are not "fixable" by new reference data, such as serial vs parallel comparison.

Okay, but this is a fairly expensive approach given that this work affects more or less every existing regression test. For what it's worth, the new candidate solutions generated roughly 1.9 GB of binary files. Please hold additional attempts until other in-flight updates, e.g., #5232, have been merged.

bska avatar May 27 '24 15:05 bska

Benchmark result overview:

Test Configuration Relative
opm-git OPM Benchmark: drogon - Threads: 1 1.004
opm-git OPM Benchmark: drogon - Threads: 8 0.819
opm-git OPM Benchmark: punqs3 - Threads: 1 1.013
opm-git OPM Benchmark: punqs3 - Threads: 8 0.961
opm-git OPM Benchmark: smeaheia - Threads: 1 1.003
opm-git OPM Benchmark: smeaheia - Threads: 8 0.876
opm-git OPM Benchmark: spe10_model_1 - Threads: 1 0.991
opm-git OPM Benchmark: spe10_model_1 - Threads: 8 0.993
opm-git OPM Benchmark: flow_mpi_extra - Threads: 1 0.999
opm-git OPM Benchmark: flow_mpi_extra - Threads: 8 0.897
opm-git OPM Benchmark: flow_mpi_norne - Threads: 1 1.008
opm-git OPM Benchmark: flow_mpi_norne - Threads: 8 0.952
opm-git OPM Benchmark: flow_mpi_norne_4c_msw - Threads: 1 0.99
opm-git OPM Benchmark: flow_mpi_norne_4c_msw - Threads: 8 0.946
  • Speed-up = Total time master / Total time pull request. Above 1.0 is an improvement. *

View result details @ https://www.ytelses.com/opm/?page=result&id=2486

ytelses avatar May 27 '24 15:05 ytelses

jenkins build this opm-tests=5157 please

totto82 avatar May 28 '24 06:05 totto82

jenkins build this opm-tests=5157 please

OPM/opm-tests#5157 does not exist and https://github.com/OPM/opm-tests/pull/1165 can not be used due to merging conflicts. You probably should update_data again to get a reference update PR.

GitPaean avatar May 28 '24 10:05 GitPaean

You probably should update_data again to get a reference update PR.

Maybe. If we're "just" trying to determine which case will not be cured by updating the reference solutions, then we can get a lot of information from the test failures of the latest jenkins run. Regression test cases named mpi.compareECLFiles_flow+<CASE> are those that would typically be fine when we update the reference solutions. The others are

  • mpi.compareDamarisFiles_flow+SPE1CASE1
  • mpi.compareParallelSim_flow+AQUFLUX-01
  • mpi.compareParallelSim_flow+SPE1CASE2_SOLVENT
  • mpi.compareRestartedSim_flow+NETWORK-01-REROUTE-RESTART
  • mpi.compareRestartedSim_flow+NETWORK-01-RESTART
  • mpi.python_fluidstate_variables
  • mpi.python_primary_variables

The Damaris case may be okay as well if we update the reference solutions. Finally, there is the case named mpi.compareECLFiles_flow_blackoil_polyhedralgrid+SPE1CASE2 which I don't know enough about without looking into the details.

bska avatar May 28 '24 11:05 bska

I've started to run the tests for seperate parts of the PR in attempt to get a better overview. I don't know whether the --local-well-solve-control-switching-option has ever been run on anything other than flow_blackoil, and I see for instance the potential calculations for this option does not handle solvent. Should be simple fix, but will investigate a bit further to check for possible other implications.

steink avatar May 30 '24 09:05 steink

I'm working on this and have gone through all failures that --local-well-solve-control-switching=true are contributing to. This was a larger number (175) than anticipated and mainly due to differences in potential calculations. Even when potientals aren't used or written to summary, they are written to restart, so small differences are captured there.

With current default, the potentials for standard wells are calculated by local well solve followed by an explicit update using updated pressure in the following lines: https://github.com/OPM/opm-simulators/blob/cc6acb77221c729570bbc64f9de08b98f3ce0bdd/opm/simulators/wells/StandardWell_impl.hpp#L1554-L1557 I understand the intention here (ameliorate errors due to explicit treatment of pressure-drop along well), but I did not include this second round of updates in computeWellPotentialsImplicit because I was uncertain on the behaviour of this fix when equations are non-linear (e.g., cross-flow). It could easily be included, and would lead to fewer failures, but I'm not sure it's neccessary. Any opinions on this @GitPaean , @totto82 or others?

Otherwise, there is a large amount (~50) of test failures due to different time-stepping (no clear advantage just slightly different). Finally, there are 5 tests where --local-well-solve-control-switching=true seem to worsen the result. Will study these a bit more in detail.

steink avatar Jun 13 '24 10:06 steink

I understand the intention here (ameliorate errors due to explicit treatment of pressure-drop along well), but I did not include this second round of updates in computeWellPotentialsImplicit because I was uncertain on the behaviour of this fix when equations are non-linear (e.g., cross-flow). It could easily be included, and would lead to fewer failures, but I'm not sure it's neccessary. Any opinions on this @GitPaean , @totto82 or others?

For this part, how big is the difference of the well potential values. If the difference is only minor, I think you can decide. If the difference is more significant, maybe we should have a look.

GitPaean avatar Jun 14 '24 12:06 GitPaean

I understand the intention here (ameliorate errors due to explicit treatment of pressure-drop along well), but I did not include this second round of updates in computeWellPotentialsImplicit because I was uncertain on the behaviour of this fix when equations are non-linear (e.g., cross-flow).

If I remember correctly this improved the results on some particular wells on some models. But this was long time ago, and I dont remember the model. Maybe it was Norne or SPE9? No strong opinions, as the difference should be small anyway.

totto82 avatar Jun 14 '24 13:06 totto82

@GitPaean @totto82 thanks! From what I can see, differences are small compared to e.g., changing the linear solver ;). At some point we discussed having a seperate well_state for the potentials which would avoid the copying and provide better initial guesses for the calculations (this would also give explicit p-drop calculations at relevant pressure), but this is for later.

steink avatar Jun 14 '24 14:06 steink