particles icon indicating copy to clipboard operation
particles copied to clipboard

Fixing seeds and reproducibility

Open nchopin opened this issue 9 months ago • 0 comments

Discussed in https://github.com/nchopin/particles/discussions/86

Originally posted by MauroCE May 4, 2024 I really like this package and it's very intuitive to use :) One thing that would make me love it even more would be to allow the user to provide/fix the random state so that results can be exactly reproduced. I am still getting familiar with the abstractions and inner workings of the package, but I was thinking something like this could work:

  1. Defining a function in utils.py that generates an instance of np.random.Generator given either a seed or another rng
def setup_rng(seed, rng):
    assert (seed is None) or (rng is None), "At least one of `seed` or `rng` must be None."
    if rng is None:
        if seed is None:
            seed = np.random.randint(low=1000, high=9999)
        rng = np.random.default_rng(seed=seed)
    return rng
  1. Add reproducibility for resampling schemes by modyfing rs_doc, resampling_scheme, resampling. Mostly this requires adding an extra argument rng=None to the various resampling functions (as well as uniform_spacings and the queue). E.g.
from particles.utils import setup_rng 
rs_doc = """\

    Parameters
    ----------
    W : (N,) ndarray
        normalized weights (>=0, sum to one)
    M : int, optional (set to N if missing)
        number of resampled points.
    rng : np.random.Generator, optional (instantiated at random if missing)
        random number generator for reproducibility

    Returns
    -------
    (M,) ndarray
     M ancestor variables, drawn from range 0, ..., N-1
"""

def resampling_scheme(func):
    """Decorator for resampling schemes."""

    @functools.wraps(func)
    def modif_func(W, M=None, rng=None):
        M = W.shape[0] if M is None else M
        return func(W, M, rng=rng)

    rs_funcs[func.__name__] = modif_func
    modif_func.__doc__ = func.__doc__ + rs_doc
    return modif_func

def resampling(scheme, W, M=None, rng=None):
    rng = setup_rng(seed=None, rng=rng)
    try:
        return rs_funcs[scheme](W, M=M, rng=rng)
    except KeyError:
        raise ValueError(f"{scheme} is not a valid resampling scheme")
        
@resampling_scheme
def systematic(W, M, rng=None):
    """Systematic resampling."""
    su = (rng.uniform(low=0.0, high=1, size=1) + np.arange(M)) / M
    return inverse_cdf(su, W)
  1. Add rng=None to the arguments of ArrayMCMC and store it as an attribute. Then ArrayMetropolis, ArrayRandomWalk etc can use it within the step() and proposal methods respectively. When one instantiates a MCMCSequence one would pass an optional rng. In the same way, rng is passed down from the FKSMCsampler, which gets handed it down from e.g. Tempering or AdaptiveTempering.

The same thing can be done for FeynmanKac and SMC. I tried it on my machine and it seems to work smoothly. One could probably work directly with rng and the argument seed in setup_rng would be redundant. I find that, depending on what I am trying to do and compare results with, I sometimes need to pass a seed or an rng, so could be an option to leave both.

Let me know what you think :)

nchopin avatar May 05 '24 10:05 nchopin