Modify Appleyard chopping
Restrict update of rs/rv, rsw/rvw and zfraction in the extended blackoil model by the saturation scaling factor from the Appleyard-chopping.
This is part 1 of https://github.com/OPM/opm-models/pull/803
jenkins build this please
benchmark please
Black is PR. Green is reference results.
FWIR: Summary file
Fails for 3 entries
Largest absolute error: 2.9583301e+02
Largest relative error: 1.7041231e-01
Black is PR. Green is reference results.
GGPI:INJE: Summary file
Fails for 2 entries
Largest absolute error: 1.9626675e+08
Largest relative error: 2.3400146e-01
WWPR:PROD3: Summary file
Fails for 1 entries
Largest absolute error: 2.2281599e-01
Largest relative error: 7.5347273e-02
WWPGR:PROD1: Summary file
Fails for 1 entries
Largest absolute error: 1.2884613e+02
Largest relative error: 1.1923457e-01
WWPR:PROD3: Summary file
Fails for 6 entries
Largest absolute error: 5.3414059e-01
Largest relative error: 1.0045919e-01
WLPR:OPU02: Summary file
Fails for 4 entries
Largest absolute error: 1.0356105e+02
Largest relative error: 4.8567018e-01
WWIR:WIL01: Summary file
Fails for 3 entries
Largest absolute error: 3.1060254e+02
Largest relative error: 7.2784564e-02
WWPR:PROD3: Summary file
Fails for 4 entries
Largest absolute error: 2.0443020e+00
Largest relative error: 1.0837778e-01
WGIP:INJ1: Summary file
Fails for 3 entries
Largest absolute error: 1.3314957e+08
Largest relative error: 1.8527693e-01
WWPR:PROD3: Summary file
Fails for 1 entries
Largest absolute error: 1.1698884e-01
Largest relative error: 2.0989576e-01
WWPR:PROD3: Summary file
Fails for 1 entries
Largest absolute error: 1.1698884e-01
Largest relative error: 2.0989576e-01
WWPR:PROD1: Summary file
Fails for 1 entries
Largest absolute error: 1.2119231e+00
Largest relative error: 1.9043798e-01
WWPR:PROD1: Summary file
Fails for 2 entries
Largest absolute error: 1.6109295e+00
Largest relative error: 1.0339842e-01
WBHP:INJ1: Summary file
Fails for 1 entries
Largest absolute error: 2.6048584e+01
Largest relative error: 7.7440490e-02
WBHP:INJ1: Summary file
Fails for 1 entries
Largest absolute error: 4.0556641e+01
Largest relative error: 9.2056089e-02
WTHP:PROD1: Summary file
Fails for 1 entries
Largest absolute error: 6.5482483e+00
Largest relative error: 6.0313764e-02
WWPR:PROD1: Summary file
Fails for 2 entries
Largest absolute error: 8.0917358e+01
Largest relative error: 2.1558314e-01
WOPR:PROD3: Summary file
Fails for 1 entries
Largest absolute error: 4.6544556e+01
Largest relative error: 1.1191940e-01
WOPR:OP_1: Summary file
Fails for 1 entries
Largest absolute error: 3.7582776e+00
Largest relative error: 9.9941726e-01
WWIR:WI_1: Summary file
Fails for 2 entries
Largest absolute error: 1.3175140e+02
Largest relative error: 7.6847942e-01
WOPR:OP_1: Summary file
Fails for 2 entries
Largest absolute error: 1.8420239e+02
Largest relative error: 1.9601552e-01
WOPR:OP_1: Summary file
Fails for 4 entries
Largest absolute error: 2.2667194e+02
Largest relative error: 3.0097662e-01
WTHP:PROD3: Summary file
Fails for 9 entries
Largest absolute error: 2.4072021e+01
Largest relative error: 5.8558714e-01
WOPR:PROD3: Summary file
Fails for 1 entries
Largest absolute error: 1.9500918e+02
Largest relative error: 7.5801425e-01
I have manually gone through the test failures and plotted significant deviations. Cases not shown have only minor changes. The significant differences points to changes in time-stepping that again affects the results. Some more testing on field models is needed before concluding if these changes improves the stability of the newton update, but the test models are ok IMO.
I think it would be good to automate the process to evaluate the test failures. Currently this involves significant manual work. What I have done this time is to go through the test failures and plot the worst offending vectors using qsummary. Like
~/workspace/opm/qsummary/build/qsummary \
flow+udq_wconprod/UDQ_WCONPROD. \
~/workspace/opm/opm-tests/udq_actionx/opm-simulation-reference/flow/UDQ_WCONPROD. \
-v WLPR:OPU02
@akva2 What do you think? Could the current test infrastructure be extended with such an workflow?
Benchmark result overview:
| Test | Configuration | Relative |
|---|---|---|
| opm-git | OPM Benchmark: drogon - Threads: 1 | 0.979 |
| opm-git | OPM Benchmark: drogon - Threads: 8 | 0.617 |
| opm-git | OPM Benchmark: smeaheia - Threads: 1 | 1 |
| opm-git | OPM Benchmark: smeaheia - Threads: 8 | 1 |
| opm-git | OPM Benchmark: spe10_model_1 - Threads: 1 | 0.991 |
| opm-git | OPM Benchmark: spe10_model_1 - Threads: 8 | 1.001 |
| opm-git | OPM Benchmark: flow_mpi_extra - Threads: 1 | 1.06 |
| opm-git | OPM Benchmark: flow_mpi_extra - Threads: 8 | 0.945 |
| opm-git | OPM Benchmark: flow_mpi_norne - Threads: 1 | 1.016 |
| opm-git | OPM Benchmark: flow_mpi_norne - Threads: 8 | 0.975 |
- Speed-up = Total time master / Total time pull request. Above 1.0 is an improvement. *
View result details @ https://www.ytelses.com/opm/?page=result&id=2114
I think it would be good to automate the process to evaluate the test failures. Currently this involves significant manual work.
Yes, indeed. Manually going through all the test failures (very often, it is tens of them) is significant work. When time stepping changes, all the current jenkins comparison basically not sensible anymore. If the jenkins can help to plot all the relevant plots out, it will be a big step towards the right direction.
I looked a bit more in details on the benchmark results for drogon. With CPRW the performance is similar for this PR and master.
It was suggested by @hnil to use fixed timesteps with no chopping for all feature tests, to avoid this quagmire. I think that is a good idea that could avoid a lot of extra work with tests that seemingly fail. An alternative that may be a bit weaker would be to only compare the solutions at the report steps.
For this concrete PR, I very much want to say "go ahead" but the changes are large enough to make me a little nervous. Maybe for one or a few of the "most failing" cases you could take a reference run, extract the timesteps, then run using the PR with the fixed steps, and see if the difference is then significant? (If not, then the difference was only caused by different timestepping and we are good to go.)
Maybe for one or a few of the "most failing" cases you could take a reference run,
The difficulty is that how to determine most failing is not easy to tell. We might oversee the real problems, for example, a bug. Then the purpose of jenkins regression test is defeated to a large extent.
The difficulty is that how to determine
most failingis not easy to tell. We might oversee the real problems, for example, a bug. Then the purpose of jenkins regression test is defeated to a large extent.
I agree, and I did not intend this as a new general procedure, only as a way to assess the test failures here a bit better, at the same time seeing if the idea of fixing timesteps is workable.
jenkins build this please