opm-grid icon indicating copy to clipboard operation
opm-grid copied to clipboard

Add LGRs on a distributed grid (first steps)

Open aritorto opened this issue 1 year ago • 8 comments

This PR

  • refactors some methods related to refinement to take into account the case when the grid has been distributed (before refinement),
  • tests a few cases, adding LGRs after the grid has been distributed,
  • allows global refinement of a distributed grid "only once" (meaning that it can be invoked only one time with argument equal to 1, for now),
  • does not imply any updates on the reference manual.

aritorto avatar Aug 07 '24 12:08 aritorto

jenkins build this please

aritorto avatar Aug 07 '24 12:08 aritorto

jenkins build this please

aritorto avatar Aug 07 '24 13:08 aritorto

addLgrsDistGrid.log

I get different output. In any case, I set the PR as draft for now. Screenshot from 2024-08-08 07-55-49

aritorto avatar Aug 08 '24 05:08 aritorto

jenkins build this please

aritorto avatar Aug 08 '24 07:08 aritorto

jenkins build this please

aritorto avatar Aug 08 '24 13:08 aritorto

jenkins build this please

aritorto avatar Aug 12 '24 05:08 aritorto

Last commit inspired in suggestion on draft PR from @blattms

aritorto avatar Aug 16 '24 08:08 aritorto

jenkins build this please

aritorto avatar Aug 19 '24 14:08 aritorto

I'm marking this PR ready for review ,even though I need to work a bit more on the test file, which currently does not fail when you consider "one test case per run". (This explains why there are large commented pieces on the test). It seems that it can be improved selecting in which process each cell should be sent. In the meantime, I run jenkins too.

aritorto avatar Aug 22 '24 09:08 aritorto

jenkins build this please

aritorto avatar Aug 22 '24 09:08 aritorto

jenkins build this please

aritorto avatar Aug 22 '24 12:08 aritorto

Something might need to be fixed since running the all the tests defined in addLgrsOnDistrubutedGrid_test.cpp fail, but running each of them separately, it does not fail.

aritorto avatar Aug 23 '24 10:08 aritorto

jenkins build this please

aritorto avatar Aug 23 '24 10:08 aritorto

did not look at details but sounds like a fixture initializing mpi et al.

akva2 avatar Aug 23 '24 10:08 akva2

thanks @akva2 I'm a bit new into mpi, so I'm not 100% about what you meant. In any case, I also noticed that, when requirements for refinements are not fulfilled, I should "throw in all the processes" (terminate) - not only one of them. I'll fix that soon. Does this happen to be related to your comment?

aritorto avatar Aug 23 '24 11:08 aritorto

not really, what i mean is if you have a test fixture which calls MPI_Init / MPI_Finalize (directly or indirectly), you are in trouble because those can only be called once per program invocation. So it would work fine running one test, but since the fixture is constructed / destructed once per test, you'd be in trouble running multiple tests.

akva2 avatar Aug 23 '24 11:08 akva2

Thanks a lot for the explanation, so these last commits where I separated the text cases (one per file) in the end might make sense, if I got it correctly. I'll think about it! Thanks again @akva2

aritorto avatar Aug 23 '24 11:08 aritorto

you can see an example of the typical way of solving this in https://github.com/OPM/opm-simulators/blob/master/tests/test_broadcast.cpp

in particular note the use of BOOST_TEST_NO_MAIN and implementation of a main method.

akva2 avatar Aug 23 '24 11:08 akva2

@akva2 We actually found out what the problem was:

It seems like Zoltan has some kind of internal state (maybe a random number generator) and you cannot rely on the partitioning being the same the second time you call it (or if you previously computed another partitioning in addition).

I guess this might even be a feature: If you compute a partitioning and later realize that it was rubbish, you could start over and hope to get a better partitioning by chance. :sweat_smile:

Even for our small example computing another partitioning before has quite an impact:

Below I print the owned cell indices per rank.

With an additional previous partitioning call to zoltan: rank 0:0 1 4 5 8 9 16 20 21 rank 1:12 13 17 24 25 28 29 32 33 rank 2:2 3 6 7 10 11 18 22 23 rank 3:14 15 19 26 27 30 31 34 35

Without the previous partitioning: rank 0:14 15 18 19 26 27 30 31 35 rank 1:2 3 6 7 10 11 22 23 34 rank 2:12 13 16 17 24 25 28 29 33 rank 3:0 1 4 5 8 9 20 21 32

Hence we need to force our partitions in this case to make this stable.

blattms avatar Aug 26 '24 07:08 blattms

ah that old fun. there's a PHG_RANDOMIZE_INPUT parameter that might help. it could also be LB_APPROACH not set to PARTITION.

akva2 avatar Aug 26 '24 07:08 akva2

Cool, thanks. The result may still differ between different versions, etc.

blattms avatar Aug 26 '24 07:08 blattms

Thanks @akva2 and @blattms for your feedback. The PR is ready for review, after a jenkins check

aritorto avatar Aug 26 '24 11:08 aritorto

jenkins build this please

aritorto avatar Aug 26 '24 11:08 aritorto

jenkins build this serial please

blattms avatar Aug 27 '24 08:08 blattms

jenkins build this serial please

aritorto avatar Aug 27 '24 10:08 aritorto

jenkins build this serial please

aritorto avatar Aug 28 '24 08:08 aritorto

jenkins build this serial please

aritorto avatar Aug 28 '24 14:08 aritorto

jenkins build this serial please

aritorto avatar Aug 29 '24 11:08 aritorto

jenkins build this serial please

aritorto avatar Aug 29 '24 13:08 aritorto

Thanks a lot for all the work. I am ready to merge this. Should we maybe reduce the massive amount of commit by rewriting history, first?

blattms avatar Aug 30 '24 06:08 blattms