QuantEcon.py icon indicating copy to clipboard operation
QuantEcon.py copied to clipboard

Testing a function with randomness

Open oyamad opened this issue 9 years ago • 16 comments

I think it would be useful if we have a guideline about how to write a unittest for a function that involves randomness.

There are two approaches:

  1. Use np.random.seed (Example: test_quad.py, to match the outcomes to those from the CompEcon toolbox?)
  2. Rely on the law of large numbers (Example: test_discrete_rv)

Is there any "principle"?

oyamad avatar May 06 '15 12:05 oyamad

I think that technique 1 is the default approach and should be fine for most cases. If we have a lot of tests based on the LLN it might take quite a while to run the suite of tests.

jstac avatar May 07 '15 10:05 jstac

I have typically followed Robert Kerns suggestion on SO and created an instance of numpy.random.RandomState which returns a numpy-aware PRNG...

import numpy as np

prng = np.random.RandomState(42)

... now you have an object prng (with methods for generating random arrays from various probability distributions) that can be passed around as an input/arg to functions or other classes.

davidrpugh avatar May 07 '15 10:05 davidrpugh

@davidrpugh : do you now if the RandomState guarantees you the exact same random numbers across computers / platforms ?

One reason to use a seed and not generate a random combination of input is that some tests may have a small probability to fail. For instance, when trying to invert a random matrix, there is a small probability that it will be ill-conditioned. But if a good seed has been found, it can be reused for all subsequent tests and they won't fail randomly.

In the case of test_quad, I suspect that tests could have been written without randomness or comparison with the matlab equivalent so I'm not sure it's the good example to follow.

albop avatar May 07 '15 16:05 albop

I agree with @alpob with respect to test_quad. I initially wrote those tests when I was doing a big push to implement all the quadrature routines from compecon. I didn't take the time to find a good reference for what the answers should be, and had the Matlab answers readily available so I compared against that.

sglyon avatar May 07 '15 16:05 sglyon

@albop From the discussion in the SO post that I linked to I came away with the impression that using np.random.RandomState would give you more consistently reproducible randomness across platforms/computers. Additionally, it allows you to have more than one stream of randomness in the same script (i.e., prng1 = np.random.RandomState(seed) and prng2 = np.random.RandomState(other_seed)) which might well be very useful for tests.

davidrpugh avatar May 07 '15 16:05 davidrpugh

Thanks for the comments. I have opened a PR for mc_sample_path #152.

Is there any clever way to use np.random.RandomState in the test function itself?

oyamad avatar May 08 '15 23:05 oyamad

@oyamad

Not sure what you mean "in the test function"? One possibility would be to create the prng instance externally and then access its methods from within the function:

prng = np.random.RandomState(1234)

def test_function(prng, size, *args, **kwargs):
    """Test some other function."""
    randomness = prng.random_sample(size)
    # More testing code here!

Was this what you had in mind?

davidrpugh avatar May 09 '15 06:05 davidrpugh

@davidrpugh np.random.random is embedded in the tested function mc_sample_path. test_mc_sample_path_seed does not work as desired if np.random.seed is replaced by np.random.RandomState. So I wondered what the advantage of np.random.RandomState over np.random.seed is in this case. I may be misunderstanding something.

oyamad avatar May 09 '15 11:05 oyamad

@oyamad

Where is the function test_mc_sample_path_seed? I can't find it (at least not quickly)...

davidrpugh avatar May 09 '15 12:05 davidrpugh

Sorry, test_mc_sample_path_seed is found here.

oyamad avatar May 09 '15 13:05 oyamad

Given the way that mc_sample_path is currently written you will not be able to use np.random.RandomState(seed) as suggested in the SO post and in this thread.

I wonder if this indicates a problem with the API for mc_sample_path function (and perhaps the Markov chain module more generally). It doesn't appear that a seed value enters into the module API at all. I suppose the implicit assumption is that the user will set the seed as a global variable using np.random.seed(seed)? I would propose changing the API to explicitly include a seed value (or possibly a np.random.RandomState instance, depending) as an input where necessary.

@jstac @albop @mmcky I would be interested in your thoughts on this?

davidrpugh avatar May 09 '15 15:05 davidrpugh

Just a quick note to keep in mind. The Numba docs say:

Numba supports top-level functions from the numpy.random module, but does not allow you to create individual RandomState instances.

Changing the API to support a RandomState instance isn't possible (yet), but I think adding the option of choosing a seed is a good idea (It is nice to be able to replicate the same chain). It seems your suggestion for passing an integer which then has np.random.seed called on it would still be plausible through Numba.

cc7768 avatar May 09 '15 21:05 cc7768

I think @davidrpugh is right --- there should be a simple way to pass in a seed to mc_sample_path and we should then follow a similar construction for other functions that generate time series. I think passing an integer as an optional argument that has np.random.seed called on is the simplest and cleanest solution.

jstac avatar May 10 '15 11:05 jstac

I wrote some sample code (with the code from 95627f2d73e3f50e6f2fce5d465a01f8b8d8461f).

np.random.RandomState seems to work in object mode (at least in Numba 0.18.2); see Out[6] in the notebook above, where the code line in green indicates that the loop is jit-compiled.

oyamad avatar May 10 '15 11:05 oyamad

Indeed, being able to supply a seed argument is a must-have, if we want stochastic results to be really replicable. Giving the user an option to supply a random seed by supplying either a random state or an integer seems perfect. @jstac: np.random.seed and RandomState are not functionally equivalent. The former has side effects and possibly affect all your routines depending on random noise. The latter is in general more predictable. For now, numba has its own seed generator (the equivalent of random.seed) that is isolated from the rest of the library. Then giving seeds as an integer to numba routines, won't affect any random state outside. So using old-school seed in numba routines, (when they need to run in full nopython mode) isn't a big inconvenience.

albop avatar May 13 '15 18:05 albop

For further discussion about what we want to test for a function with randomness, see https://github.com/QuantEcon/QuantEcon.py/pull/166#issuecomment-128986988.

oyamad avatar Aug 27 '15 13:08 oyamad