pymor
pymor copied to clipboard
add `random_state` and `seed` arguments to `rrf` and `adaptive_rrf`
This PR adds random_state=None
and seed=None
to the arguments of rrf
and adaptive_rrf
in order to enable reproducible computations and fixes a typo in the docstring of VectorArray.random
.
Codecov Report
Merging #1564 (a0b0a8e) into main (d69e1be) will increase coverage by
0.16%
. The diff coverage isn/a
.
:exclamation: Current head a0b0a8e differs from pull request most recent head e4f889b. Consider uploading reports for the commit e4f889b to get more accurate results
Impacted Files | Coverage Δ | |
---|---|---|
src/pymor/reductors/h2.py | 100.00% <ø> (+22.53%) |
:arrow_up: |
src/pymor/models/iosys.py | 79.12% <0.00%> (+0.29%) |
:arrow_up: |
src/pymor/bindings/dunegdt.py | 76.85% <0.00%> (+2.47%) |
:arrow_up: |
src/pymortests/mpi_run_demo_tests.py | 85.10% <0.00%> (+4.25%) |
:arrow_up: |
Looks good to me so far. Would you also add seed
to adaptive_rrf
?
Hopefully, this doesn't cause strange git conflicts with #1552...
We actually have the policy that seed
parameters go along with random_state
parameters as in ParameterSpace.sample_randomy
. Grepping through pyMOR, I seem to be the only person following this policy. I quite like it, however, so I would encourage following it here as well. In particular, the line
random_state = get_random_state(random_state, seed)
ensures that by default a global random state is used, which is seeded by a pyMOR default. Of course, that mens that calling rrf
twice in a row will lead to (deterministically) different results. However, this is what I would expect from a randomized method.
We actually have the policy that
seed
parameters go along withrandom_state
parameters as inParameterSpace.sample_randomy
. Grepping through pyMOR, I seem to be the only person following this policy. I quite like it, however, so I would encourage following it here as well. In particular, the linerandom_state = get_random_state(random_state, seed)
ensures that by default a global random state is used, which is seeded by a pyMOR default.
I also came across this part of the code when getting to the bottom of the inner workings of VectorArray.random
. I was a bit unsure how to go about it at the time. random_state = get_random_state(random_state, seed)
gets called in VectorSpace.random
so I assumed it would be correct.
I think I understand it a bit better now and I have pushed a new commit. Is this correct now?
What bugs me a little bit with the latest version is the encapsulated redundancy of the assertions and generation of random states in the rrf
methods and VectorSpace.random
. I guess a solution would be to only have random_state
as an argument and do the seeding manually before calling rrf
...
Of course, that mens that calling
rrf
twice in a row will lead to (deterministically) different results. However, this is what I would expect from a randomized method.
In the latest commit f05fcc6b79eebe87e2c1c5288abf20d7feec01b2, calling rrf(A, seed=1)
twice gives me identical results (that is what I want).
Another issue is that I am still struggling with the @defaults
decorator. Should random_state
and seed
be included there as well?
Looks good to me so far. Would you also add
seed
toadaptive_rrf
?
Yes.
Hopefully, this doesn't cause strange git conflicts with #1552...
I implemented this locally in order to test that PR and I think this is a handy feature. I don't think this should interfere functionally as the arguments are set to None
per default which is what happened before anyway. Of course a rebase will be necessary. If #1552 gets to messy, I don't mind waiting with the merge and then rebasing this PR.
What bugs me a little bit with the latest version is the encapsulated redundancy of the assertions and generation of random states in the
rrf
methods andVectorSpace.random
. I guess a solution would be to only haverandom_state
as an argument and do the seeding manually before callingrrf
...
I'm not sure if I understand what you mean. If you pass random_state
to VectorSpace.random
, as you do, this random state will be used and no additional seeding takes place (see pymor.tools.random.get_random_state
).
Of course, that mens that calling
rrf
twice in a row will lead to (deterministically) different results. However, this is what I would expect from a randomized method.In the latest commit f05fcc6, calling
rrf(A, seed=1)
twice gives me identical results (that is what I want).
Yes, that should also be the case. However, if you call rrf
twice without setting a seed, you should get different results, which, however, should be the same when you run the same script containing these calls twice.
Another issue is that I am still struggling with the
@defaults
decorator. Shouldrandom_state
andseed
be included there as well?
No, the idea is to have single seed for the entire program run which is a default for get_random_state
. So, if you want to call a randomized method in pyMOR twice in the same script and want to get the same result, you have to manually specify a seed. However, using get_random_state
ensures that subsequent runs of the same script always yield the same result.
Yes, that should also be the case. However, if you call
rrf
twice without setting a seed, you should get different results, which, however, should be the same when you run the same script containing these calls twice.No, the idea is to have single seed for the entire program run which is a default for
get_random_state
. So, if you want to call a randomized method in pyMOR twice in the same script and want to get the same result, you have to manually specify a seed. However, usingget_random_state
ensures that subsequent runs of the same script always yield the same result.
Ok great, then I think I understood it correctly.
I'm not sure if I understand what you mean. If you pass
random_state
toVectorSpace.random
, as you do, this random state will be used and no additional seeding takes place (seepymor.tools.random.get_random_state
).
What I meant was that
https://github.com/pymor/pymor/blob/a0b0a8e7add525872ede4141c7d468a10fcc496f/src/pymor/vectorarrays/interface.py#L856-L857
is now done in the rrf
functions, as well as the VectorSpace.random
method, which is a bit redundant. If for example, random_state
and seed
would then be also added as arguments to randomized SVD which calls rrf
these two lines would also have to be in the randomized SVD function.
It shouldn't be a problem really, just lets me think about whether its cleaner to manually do the seeding when necessesary and then only pass random_state
to the functions of rand_la.py
It shouldn't be a problem really, just lets me think about whether its cleaner to manually do the seeding when necessesary and then only pass random_state to the functions of rand_la.py
I quite like the idea of removing the seed
parameter everywhere and replacing it by random_state
(where it is missing). That would mean that the methods in lrradi
, samdp
, eigs
would loose their seed
defaults
replacing them by a global default
seed
for the default random state. Thus, as already said above, this would mean that consecutive calls to these methods would lead to different results, deterministically determined by the global seed and the execution order. If one would want to have the same random state one would have to pass a manually seeded random state to the method. I would assume that this is rarely needed. What do you think @pmli, @lbalicki?
It shouldn't be a problem really, just lets me think about whether its cleaner to manually do the seeding when necessesary and then only pass random_state to the functions of rand_la.py
I quite like the idea of removing the
seed
parameter everywhere and replacing it byrandom_state
(where it is missing). That would mean that the methods inlrradi
,samdp
,eigs
would loose theirseed
defaults
replacing them by a globaldefault
seed
for the default random state. Thus, as already said above, this would mean that consecutive calls to these methods would lead to different results, deterministically determined by the global seed and the execution order. If one would want to have the same random state one would have to pass a manually seeded random state to the method. I would assume that this is rarely needed. What do you think @pmli, @lbalicki?
Up to randomization that is difficult or expensive to avoid (e.g., parallelization in BLAS), I think having as much determinism as possible is desirable (e.g., to have more-or-less reproducible code for publications). So, although it might seem unintuitive, I would be for methods, that use random numbers, to return the same result on consecutive calls. I was actually thinking that we should discuss JAX's approach (basically, making seed
a required argument without a default value, and a way to generate new seeds) and whether it makes sense to do something similar in pyMOR.
Let's continue the big design discussion in #1577.
This can be closed since merging #1736, right?
Yes!