pymor icon indicating copy to clipboard operation
pymor copied to clipboard

add `random_state` and `seed` arguments to `rrf` and `adaptive_rrf`

Open artpelling opened this issue 2 years ago • 10 comments

This PR adds random_state=None and seed=None to the arguments of rrf and adaptive_rrf in order to enable reproducible computations and fixes a typo in the docstring of VectorArray.random.

artpelling avatar Feb 09 '22 17:02 artpelling

Codecov Report

Merging #1564 (a0b0a8e) into main (d69e1be) will increase coverage by 0.16%. The diff coverage is n/a.

:exclamation: Current head a0b0a8e differs from pull request most recent head e4f889b. Consider uploading reports for the commit e4f889b to get more accurate results

Impacted Files Coverage Δ
src/pymor/reductors/h2.py 100.00% <ø> (+22.53%) :arrow_up:
src/pymor/models/iosys.py 79.12% <0.00%> (+0.29%) :arrow_up:
src/pymor/bindings/dunegdt.py 76.85% <0.00%> (+2.47%) :arrow_up:
src/pymortests/mpi_run_demo_tests.py 85.10% <0.00%> (+4.25%) :arrow_up:

codecov[bot] avatar Feb 10 '22 11:02 codecov[bot]

Looks good to me so far. Would you also add seed to adaptive_rrf?

Hopefully, this doesn't cause strange git conflicts with #1552...

pmli avatar Feb 10 '22 13:02 pmli

We actually have the policy that seed parameters go along with random_state parameters as in ParameterSpace.sample_randomy. Grepping through pyMOR, I seem to be the only person following this policy. I quite like it, however, so I would encourage following it here as well. In particular, the line

random_state = get_random_state(random_state, seed)

ensures that by default a global random state is used, which is seeded by a pyMOR default. Of course, that mens that calling rrf twice in a row will lead to (deterministically) different results. However, this is what I would expect from a randomized method.

sdrave avatar Feb 10 '22 16:02 sdrave

We actually have the policy that seed parameters go along with random_state parameters as in ParameterSpace.sample_randomy. Grepping through pyMOR, I seem to be the only person following this policy. I quite like it, however, so I would encourage following it here as well. In particular, the line

random_state = get_random_state(random_state, seed)

ensures that by default a global random state is used, which is seeded by a pyMOR default.

I also came across this part of the code when getting to the bottom of the inner workings of VectorArray.random. I was a bit unsure how to go about it at the time. random_state = get_random_state(random_state, seed) gets called in VectorSpace.random so I assumed it would be correct. I think I understand it a bit better now and I have pushed a new commit. Is this correct now?

What bugs me a little bit with the latest version is the encapsulated redundancy of the assertions and generation of random states in the rrf methods and VectorSpace.random. I guess a solution would be to only have random_state as an argument and do the seeding manually before calling rrf...

Of course, that mens that calling rrf twice in a row will lead to (deterministically) different results. However, this is what I would expect from a randomized method.

In the latest commit f05fcc6b79eebe87e2c1c5288abf20d7feec01b2, calling rrf(A, seed=1) twice gives me identical results (that is what I want).

Another issue is that I am still struggling with the @defaults decorator. Should random_state and seed be included there as well?

artpelling avatar Feb 10 '22 17:02 artpelling

Looks good to me so far. Would you also add seed to adaptive_rrf?

Yes.

Hopefully, this doesn't cause strange git conflicts with #1552...

I implemented this locally in order to test that PR and I think this is a handy feature. I don't think this should interfere functionally as the arguments are set to None per default which is what happened before anyway. Of course a rebase will be necessary. If #1552 gets to messy, I don't mind waiting with the merge and then rebasing this PR.

artpelling avatar Feb 10 '22 17:02 artpelling

What bugs me a little bit with the latest version is the encapsulated redundancy of the assertions and generation of random states in the rrf methods and VectorSpace.random. I guess a solution would be to only have random_state as an argument and do the seeding manually before calling rrf...

I'm not sure if I understand what you mean. If you pass random_state to VectorSpace.random, as you do, this random state will be used and no additional seeding takes place (see pymor.tools.random.get_random_state).

Of course, that mens that calling rrf twice in a row will lead to (deterministically) different results. However, this is what I would expect from a randomized method.

In the latest commit f05fcc6, calling rrf(A, seed=1) twice gives me identical results (that is what I want).

Yes, that should also be the case. However, if you call rrf twice without setting a seed, you should get different results, which, however, should be the same when you run the same script containing these calls twice.

Another issue is that I am still struggling with the @defaults decorator. Should random_state and seed be included there as well?

No, the idea is to have single seed for the entire program run which is a default for get_random_state. So, if you want to call a randomized method in pyMOR twice in the same script and want to get the same result, you have to manually specify a seed. However, using get_random_state ensures that subsequent runs of the same script always yield the same result.

sdrave avatar Feb 11 '22 11:02 sdrave

Yes, that should also be the case. However, if you call rrf twice without setting a seed, you should get different results, which, however, should be the same when you run the same script containing these calls twice.

No, the idea is to have single seed for the entire program run which is a default for get_random_state. So, if you want to call a randomized method in pyMOR twice in the same script and want to get the same result, you have to manually specify a seed. However, using get_random_state ensures that subsequent runs of the same script always yield the same result.

Ok great, then I think I understood it correctly.

I'm not sure if I understand what you mean. If you pass random_state to VectorSpace.random, as you do, this random state will be used and no additional seeding takes place (see pymor.tools.random.get_random_state).

What I meant was that

https://github.com/pymor/pymor/blob/a0b0a8e7add525872ede4141c7d468a10fcc496f/src/pymor/vectorarrays/interface.py#L856-L857

is now done in the rrf functions, as well as the VectorSpace.random method, which is a bit redundant. If for example, random_state and seed would then be also added as arguments to randomized SVD which calls rrf these two lines would also have to be in the randomized SVD function.

It shouldn't be a problem really, just lets me think about whether its cleaner to manually do the seeding when necessesary and then only pass random_state to the functions of rand_la.py

artpelling avatar Feb 11 '22 11:02 artpelling

It shouldn't be a problem really, just lets me think about whether its cleaner to manually do the seeding when necessesary and then only pass random_state to the functions of rand_la.py

I quite like the idea of removing the seed parameter everywhere and replacing it by random_state (where it is missing). That would mean that the methods in lrradi, samdp, eigs would loose their seed defaults replacing them by a global default seed for the default random state. Thus, as already said above, this would mean that consecutive calls to these methods would lead to different results, deterministically determined by the global seed and the execution order. If one would want to have the same random state one would have to pass a manually seeded random state to the method. I would assume that this is rarely needed. What do you think @pmli, @lbalicki?

sdrave avatar Feb 16 '22 09:02 sdrave

It shouldn't be a problem really, just lets me think about whether its cleaner to manually do the seeding when necessesary and then only pass random_state to the functions of rand_la.py

I quite like the idea of removing the seed parameter everywhere and replacing it by random_state (where it is missing). That would mean that the methods in lrradi, samdp, eigs would loose their seed defaults replacing them by a global default seed for the default random state. Thus, as already said above, this would mean that consecutive calls to these methods would lead to different results, deterministically determined by the global seed and the execution order. If one would want to have the same random state one would have to pass a manually seeded random state to the method. I would assume that this is rarely needed. What do you think @pmli, @lbalicki?

Up to randomization that is difficult or expensive to avoid (e.g., parallelization in BLAS), I think having as much determinism as possible is desirable (e.g., to have more-or-less reproducible code for publications). So, although it might seem unintuitive, I would be for methods, that use random numbers, to return the same result on consecutive calls. I was actually thinking that we should discuss JAX's approach (basically, making seed a required argument without a default value, and a way to generate new seeds) and whether it makes sense to do something similar in pyMOR.

pmli avatar Feb 16 '22 13:02 pmli

Let's continue the big design discussion in #1577.

sdrave avatar Feb 17 '22 15:02 sdrave

This can be closed since merging #1736, right?

pmli avatar Oct 26 '22 18:10 pmli

Yes!

artpelling avatar Oct 27 '22 09:10 artpelling