opm-grid icon indicating copy to clipboard operation
opm-grid copied to clipboard

Return geometry by value (alternative 1)

Open SoilRos opened this issue 6 months ago • 17 comments

This is to conform with the dune interface.

Note that this is possible since the introduction of shared pointers into the geometry. On the other hand, because copying shared pointer can cause performance problems I also propose an alternative #884 and compare them with the benchmarks.

SoilRos avatar Jun 13 '25 08:06 SoilRos

jenkins build this serial please

SoilRos avatar Jun 13 '25 09:06 SoilRos

jenkins build this serial please

akva2 avatar Jun 13 '25 09:06 akva2

No matter what alternative we end up preferring, we should benchmark to make sure performance does not suffer. I do not expect anythin major though.

atgeirr avatar Jun 13 '25 11:06 atgeirr

jenkins build this serial please

SoilRos avatar Jun 26 '25 11:06 SoilRos

jenkins build this serial please

SoilRos avatar Jun 26 '25 13:06 SoilRos

jenkins build this serial please

SoilRos avatar Jun 26 '25 17:06 SoilRos

jenkins build this serial please

SoilRos avatar Jun 27 '25 08:06 SoilRos

jenkins build this serial please

SoilRos avatar Jul 04 '25 10:07 SoilRos

benchmark please

blattms avatar Jul 09 '25 12:07 blattms

Benchmark results did not get posted back here. I am on it. The gist of the benchmark is, that there is often a small increase in solution time. Often less than 1%. There is on outlier though: Drogon with 8 processes where the increase is 4%. I have no clue why. We'll investigate that...

blattms avatar Jul 11 '25 08:07 blattms

[...] There is on outlier though: Drogon with 8 processes where the increase is 4%. I have no clue why. We'll investigate that...

This is not really surprising. Copying a shared pointer involves making an atomic increment to the reference counter. Even when only one processor is involved, this can take quite a number of cycles. That is usually manageable if the code is sequential, but as soon as the code becomes parallel all of these increments have to "compete" to block the bus that communicates with the other the processors. So, even if the pointer never has to be shared with other processors, the overhead of making this operation very often becomes quite noticeable. So this is one of the most common pitfalls of using shared pointers.

SoilRos avatar Jul 11 '25 12:07 SoilRos

benchmark please

blattms avatar Jul 14 '25 10:07 blattms

Benchmark result overview:

Test Configuration Relative
opm-git OPM Benchmark: drogon - Threads: 1 0.829
opm-git OPM Benchmark: drogon - Threads: 8 0.838
opm-git OPM Benchmark: punqs3 - Threads: 1 1
opm-git OPM Benchmark: punqs3 - Threads: 8 0.976
opm-git OPM Benchmark: smeaheia - Threads: 1 0.857
opm-git OPM Benchmark: smeaheia - Threads: 8 0.965
opm-git OPM Benchmark: spe10_model_1 - Threads: 1 1.005
opm-git OPM Benchmark: spe10_model_1 - Threads: 8 0.995
opm-git OPM Benchmark: flow_mpi_extra - Threads: 1 0.884
opm-git OPM Benchmark: flow_mpi_extra - Threads: 8 0.968
opm-git OPM Benchmark: flow_mpi_norne - Threads: 1 1.001
opm-git OPM Benchmark: flow_mpi_norne - Threads: 8 1.01
opm-git OPM Benchmark: flow_mpi_norne_4c_msw - Threads: 1 - FOPT (Total Oil Production At End Of Run) 1
opm-git OPM Benchmark: flow_mpi_norne_4c_msw - Threads: 8 - FOPT (Total Oil Production At End Of Run) 0.999
opm-git OPM Benchmark: flow_mpi_norne_4c_msw - Threads: 8 0.958
  • Speed-up = Total time master / Total time pull request. Above 1.0 is an improvement. *

View result details @ https://www.ytelses.com/opm/?page=result&id=2822

ytelses avatar Jul 14 '25 23:07 ytelses

Comparing alternative 1, 2 and 3:

Configuration Relative (alt 1 #885 ) Relative (alt 2 #884 ) Relative (alt 3 #891 )
OPM Benchmark: drogon - Threads: 1 0.829 1.007 1.004
OPM Benchmark: drogon - Threads: 8 0.838 0.899 0.905
OPM Benchmark: punqs3 - Threads: 1 1 1.002 0.991
OPM Benchmark: punqs3 - Threads: 8 0.976 0.991 0.995
OPM Benchmark: smeaheia - Threads: 1 0.857 1.006 0.895
OPM Benchmark: smeaheia - Threads: 8 0.965 0.922 0.997
OPM Benchmark: spe10_model_1 - Threads: 1 1.005 1.004 0.992
OPM Benchmark: spe10_model_1 - Threads: 8 0.995 0.991 0.995
OPM Benchmark: flow_mpi_extra - Threads: 1 0.884 1 0.948
OPM Benchmark: flow_mpi_extra - Threads: 8 0.968 0.898 1.026
OPM Benchmark: flow_mpi_norne - Threads: 1 1.001 0.998 1.002
OPM Benchmark: flow_mpi_norne - Threads: 8 1.01 0.923 1.007
OPM Benchmark: flow_mpi_norne_4c_msw - Threads: 1 - FOPT (Total Oil Production At End Of Run) 1 1 1
OPM Benchmark: flow_mpi_norne_4c_msw - Threads: 8 - FOPT (Total Oil Production At End Of Run) 0.999 1 1
OPM Benchmark: flow_mpi_norne_4c_msw - Threads: 8 0.958
  • Speed-up = Total time master / Total time pull request. Above 1.0 is an improvement. *

View result details for alt 1 @ https://www.ytelses.com/opm/?page=result&id=2822

View result details for alt 2 @ https://www.ytelses.com/opm/?page=result&id=2823

View result details for alt 3 @ https://www.ytelses.com/opm/?page=result&id=2824

Note that #891 needs additional fixes from #893

blattms avatar Jul 15 '25 10:07 blattms

I am kind of confused with those results: 17% of difference seems way too much for these changes. Out of curiosity, are those benchmarks compiled with Link Time Optimization (LTO)? Since the code in those methods is rather small, it could be that we are mostly measuring how long it takes to jump to the function instruction when LTO is not enabled. And, are those simulation times, or do they also include the setup time? I am getting a very large and odd slowdown in setup of alternative 2 and I don't understand why.

Assuming that those reports are with LTO and the comparison is with simulation time, I get locally these values:

Configuration Relative (alt 1 #885 ) Relative (alt 2 #884 ) Relative (alt 3 #891 )
OPM Benchmark: drogon - Threads: 1 0.998 0.988 0.985
OPM Benchmark: drogon - Threads: 8 1 0.985 0.972

If those are not done with LTO, I would suggest to at least enable them for the benchmarks and production environments. The speed up in drogon is of about 8% with any of the alternatives with LTO.

SoilRos avatar Jul 22 '25 16:07 SoilRos

jenkins build this serial please

SoilRos avatar Jul 31 '25 14:07 SoilRos

jenkins build this serial please

SoilRos avatar Aug 01 '25 10:08 SoilRos