MLOS icon indicating copy to clipboard operation
MLOS copied to clipboard

Input parameter space random state is not retained

Open kkanellis opened this issue 4 years ago • 0 comments

When defining an input space it is possible to pass a random_state argument. For example:

# Define random state
RANDOM_SEED = 42
random_state = random.Random()
random_state.seed(RANDOM_SEED)

# Define spaces
input_dim_1 = DiscreteDimension('dim1', 1, 1000)
input_dim_2 = DiscreteDimension('dim2', 1, 1000)
input_space = SimpleHypergrid(
  name='input', dimensions=[input_dim_1, input_dim_2], random_state=random_state)

You can then use the input_space hypergrid to initialize a simple optimization problem, and eventually get (initially random) suggestions from the underlying Bayesian Optimizer, as follows:


output_dim = ContinuousDimension(name="throughput", min=0, max=10**8)
output_space = SimpleHypergrid(name="objective", dimensions=[output_dim])

# Define optimization problem
optimization_problem = OptimizationProblem(
    parameter_space=input_space,
    objective_space=output_space,
    objectives=[Objective(name="throughput", minimize=False)]
)

# Initialize optimizer
optimizer_config = bayesian_optimizer_config_store.default
optimizer_factory = BayesianOptimizerFactory()
optimizer = optimizer_factory.create_local_optimizer(
    optimization_problem=optimization_problem,
    optimizer_config=optimizer_config
)

suggestion = optimizer.sample()

From my understanding, each time that the above code is invoked, the sample point should be the same, since we have fixed the input parameter space random state. However, this is not the case -- each time a different random point is being generated. It seems that for some reason the random state is not being retained.

One possible solution that I've found is to explicitly replace the parameter space random, after the optimizer has been constructed, which seems to solve this problem:

# Replace the random state of all input parameter space (sub)dimensions
optimizer.experiment_designer   \
        .optimization_problem   \
        .parameter_space        \
        .random_state = random_state

Moreover, I've found that in ExperimentDesigner class, a random number generator is being created, which does not adhere to global random seeds:

https://github.com/microsoft/MLOS/blob/dd56ab4125542cad0e68d9754ca5a6021953e63e/source/Mlos.Python/mlos/Optimizers/ExperimentDesigner/ExperimentDesigner.py#L118

In the above case, even setting numpy.random.seed() won't have an effect; numpy.random.PCG64 objects do not seem to adhere to global numpy seed (https://numpy.org/doc/stable/reference/random/bit_generators/pcg64.html#numpy.random.Generator)

This sort of things, makes it hard for people to launch reproducible experiments, without the need to dive into the internal MLOS code. The developing experience would be much better, if there was a single point for setting the random seed that will be used in MLOS: maybe something like mlos.set_random_seed() ?

PS: Full example highlighting the above issues

import random
import numpy as np

from mlos.Optimizers.BayesianOptimizer import BayesianOptimizer
from mlos.Optimizers.BayesianOptimizerConfigStore import bayesian_optimizer_config_store
from mlos.Optimizers.BayesianOptimizerFactory import BayesianOptimizerFactory
from mlos.Optimizers.OptimizationProblem import OptimizationProblem, Objective
from mlos.Spaces import SimpleHypergrid, ContinuousDimension, DiscreteDimension

RANDOM_SEED = 42

def create_optimizer(random_state):
    # Define spaces
    input_dim_1 = DiscreteDimension('dim1', 1, 1000)
    input_dim_2 = DiscreteDimension('dim2', 1, 1000)
    input_space = SimpleHypergrid(name='input',
        dimensions=[input_dim_1, input_dim_2], random_state=random_state)

    output_dim = ContinuousDimension(name="throughput", min=0, max=10**8)
    output_space = SimpleHypergrid(name="objective", dimensions=[output_dim])

    # Define optimization problem
    optimization_problem = OptimizationProblem(
        parameter_space=input_space,
        objective_space=output_space,
        objectives=[Objective(name="throughput", minimize=False)]
    )

    # Initialize optimizer
    optimizer_config = bayesian_optimizer_config_store.default
    optimizer_factory = BayesianOptimizerFactory()
    return optimizer_factory.create_local_optimizer(
        optimization_problem=optimization_problem,
        optimizer_config=optimizer_config
    )

## Generate points from optimizer
random_state_1 = random.Random()
random_state_1.seed(RANDOM_SEED)
optimizer_1 = create_optimizer(random_state_1)
gen_points_1 = [ optimizer_1.suggest() for _ in range(100) ]

## Generate points from separate (identical) optimizer
random_state_2 = random.Random()
random_state_2.seed(RANDOM_SEED)
optimizer_2 = create_optimizer(random_state_2)
gen_points_2 = [ optimizer_2.suggest() for _ in range(100) ]

# This will fail!
assert(gen_points_1 == gen_points_2)

## Generate points from optimizer
random_state_3 = random.Random()
random_state_3.seed(RANDOM_SEED)
optimizer_3 = create_optimizer(random_state_3)
# Replace random number generator (Experiment_designer.rng numpy Generator)
optimizer_3.experiment_designer.rng = (
    np.random.Generator(np.random.PCG64(seed=RANDOM_SEED)))
# Replace the random state of all input parameter space (sub)dimensions
optimizer_3.experiment_designer   \
        .optimization_problem   \
        .parameter_space        \
        .random_state = random_state_3

gen_points_3 = [ optimizer_3.suggest() for _ in range(100) ]

## Generate points from (identical) optimizer
random_state_4 = random.Random()
random_state_4.seed(RANDOM_SEED)
optimizer_4 = create_optimizer(random_state_4)
# Replace random number generator (Experiment_designer.rng numpy Generator)
optimizer_4.experiment_designer.rng = (
    np.random.Generator(np.random.PCG64(seed=RANDOM_SEED)))
# Replace the random state of all input parameter space (sub)dimensions
optimizer_4.experiment_designer   \
        .optimization_problem   \
        .parameter_space        \
        .random_state = random_state_4

gen_points_4 = [ optimizer_4.suggest() for _ in range(100) ]

# This works!
assert(gen_points_3 == gen_points_4)

kkanellis avatar Feb 22 '21 20:02 kkanellis