Ax icon indicating copy to clipboard operation
Ax copied to clipboard

[GENERAL SUPPORT]: MOO with "default settings" performs poorly after upgrading from 0.5.0 to 1.0.0

Open agrimsm opened this issue 7 months ago • 12 comments

Question

I'm running a two-objective optimization that pretty much uses "default settings", and after translating to the new v1.0.0 API, the optimization performs very poorly. I get warnings like

.venv/lib/python3.12/site-packages/linear_operator/utils/cholesky.py:40: NumericalWarning: A not p.d., added jitter of 1.0e-08 to the diagonal

and

.venv/lib/python3.12/site-packages/botorch/fit.py:215: OptimizationWarning: `scipy_minimize` terminated with status OptimizationStatus.FAILURE, displaying original message from `scipy.optimize.minimize`: ABNORMAL_TERMINATION_IN_LNSRCH

Eventually I also get:

WARNING 05-27 09:58:48] ax.models.torch.botorch_modular.surrogate: Model ModelConfig(botorch_model_class=None, model_options={}, mll_class=<class 'gpytorch.mlls.exact_marginal_log_likelihood.ExactMarginalLogLikelihood'>, mll_options={}, input_transform_classes=[<class 'botorch.models.transforms.input.Warp'>, <class 'botorch.models.transforms.input.Normalize'>], input_transform_options={'Normalize': {'center': 0.0}}, outcome_transform_classes=None, outcome_transform_options={}, covar_module_class=<class 'gpytorch.kernels.linear_kernel.LinearKernel'>, covar_module_options={}, likelihood_class=None, likelihood_options={}, name='LinearKernel with Warp') failed to fit with error All attempts to fit the model have failed.. Skipping.

and the optimizer is no longer able to find the Pareto front I got when running with v0.5.0.

Here is my experiment/client setup, first with 0.5.0:

def setup_ax_client(num_init_trials):
    ax_client = AxClient()
    ax_client.create_experiment(
        name="asdf",
        parameters=[
            {
                "name": "param1",
                "type": "range",
                "bounds": [0.3, 0.5],
            },
            {
                "name": "param2",
                "type": "range",
                "bounds": [0.0, 0.5],
            },
            {
                "name": "fixed_param",
                "type": "fixed",
                "value": 1.0,
            },
            {
                "name": "param3",
                "type": "range",
                "bounds": [0.5, 1.5],
            },
            {
                "name": "param4",
                "type": "range",
                "bounds": [0.8, 4.0],
            },
            {
                "name": "param5",
                "type": "range",
                "bounds": [0.1, 2.0],
            },
            {
                "name": "param6",
                "type": "range",
                "bounds": [0.1, 2.0],
            },
        ],
        objectives={
            "obj1": ObjectiveProperties(minimize=False, threshold=0.5e6),
            "obj2": ObjectiveProperties(minimize=True, threshold=0.65),
        },
        outcome_constraints=[
            "c1 <= 1.1", 
            "c1 >= 0.9",
            "c2 >= 150e6",
            "c3 >= 100e6",
            "c4 <= 1.0e-2",
        ],
        tracking_metric_names=[
            "tracking_metric1",
        ],
        choose_generation_strategy_kwargs={"num_initialization_trials": num_init_trials},
        overwrite_existing_experiment=True,
    )
    return ax_client

and here is the equivalent setup with v1.0.0 (note that I removed the "fixed" input parameter and the tracking metric as the API no longer has an interface for that, as far as I can tell):

def setup_ax_client(num_init_trials):
    # Create client
    client = ax.Client()
    
    # Configure experiment parameters
    parameters = [
        ax.RangeParameterConfig(
            name="param1",
            parameter_type="float",
            bounds=(0.3, 0.5),
        ),
        ax.RangeParameterConfig(
            name="param2",
            parameter_type="float",
            bounds=(0.0, 0.5),
        ),
        # Fixed parameter using choice parameter with a single value
        # ax.ChoiceParameterConfig(
        #     name="fixed_param",
        #     parameter_type="float",
        #     values=[1.0],
        # ),
        ax.RangeParameterConfig(
            name="param3",
            parameter_type="float",
            bounds=(0.5, 1.5),
        ),
        ax.RangeParameterConfig(
            name="param4",
            parameter_type="float",
            bounds=(0.8, 4.0),
        ),
        ax.RangeParameterConfig(
            name="param5",
            parameter_type="float",
            bounds=(0.1, 2.0),
        ),
        ax.RangeParameterConfig(
            name="param6",
            parameter_type="float",
            bounds=(0.1, 2.0),
        ),
    ]
    
    # Configure experiment
    client.configure_experiment(
        name="asdf",
        parameters=parameters,
    )
    
    # Configure generation strategy
    client.configure_generation_strategy(
        method="balanced",  # I also tried "fast"
        initialization_budget=num_init_trials,
        min_observed_initialization_trials=num_init_trials,
    )
    
    client.configure_optimization(
        objective="obj1, -obj2",
        outcome_constraints=[
            "obj1 >= 0.5e6",
            "obj2 <= 0.65",
            "c1 <= 1.1",
            "c1 >= 0.9",
            "c2 >= 150e6", 
            "c3 >= 100e6",
            "c4 <= 1.0e-2",
        ]
    )
    
    # Since there's no direct API for tracking_metric_names in Ax 1.0.0,
    # we access the internal API to add tracking metrics
    # client._experiment.add_tracking_metric(ax.core.metric.Metric(name="tracking_metric1"))
    
    return client

I'm guessing the 0.5.0 and 1.0.0 have inequivalent generation strategies? It looks pretty similar though. For the 0.5.0 version:

print(client.generation_strategy)

displays

GenerationStrategy(name='Sobol+BoTorch', steps=[Sobol for 10 trials, BoTorch for subsequent trials])

while for the 1.0.0 version I have

print(client._maybe_generation_strategy)

displays

GenerationStrategy(name='Center+Sobol+MBM:balanced', nodes=[CenterGenerationNode(next_node_name='Sobol'), GenerationNode(node_name='Sobol', model_specs=[GeneratorSpec(model_enum=Sobol, model_key_override=None)], transition_criteria=[MinTrials(transition_to='MBM'), MinTrials(transition_to='MBM')]), GenerationNode(node_name='MBM', model_specs=[GeneratorSpec(model_enum=BoTorch, model_key_override=None)], transition_criteria=[])])

and

print(client._experiment.optimization_config)

displays

MultiObjectiveOptimizationConfig(objective=MultiObjective(objectives=[Objective(metric_name="obj1", minimize=False), Objective(metric_name="obj2", minimize=True)]), outcome_constraints=[OutcomeConstraint(c1 <= 1.1), OutcomeConstraint(c1 >= 0.9), OutcomeConstraint(c2 >= 150000000.0), OutcomeConstraint(c3 >= 100000000.0), OutcomeConstraint(c4 <= 0.01)], objective_thresholds=[ObjectiveThreshold(obj1 >= 500000.0), ObjectiveThreshold(obj2 <= 0.65)])

Please provide any relevant code snippet if applicable.


Code of Conduct

  • [x] I agree to follow this Ax's Code of Conduct

agrimsm avatar May 27 '25 16:05 agrimsm

@agrimsm Thanks for reporting this! Are you able to post some snapshots of your results, or potentially the output of client._experiment.to_df() to help us debug internally?

bernardbeckerman avatar May 27 '25 19:05 bernardbeckerman

Thanks for providing enough detail to rule out most simple explanations. Along these lines, as a quick check, have you set the fixed_parameter value to 1.0 inside your evaluation function for the Ax 1.0.0 optimization? You may have already done this but it's worth checking, just to make sure the omission of fixed_parameter is not causing the observed differences.

bernardbeckerman avatar May 27 '25 19:05 bernardbeckerman

Thanks for providing enough detail to rule out most simple explanations. Along these lines, as a quick check, have you set the fixed_parameter value to 1.0 inside your evaluation function for the Ax 1.0.0 optimization? You may have already done this but it's worth checking, just to make sure the omission of fixed_parameter is not causing the observed differences.

Thanks, good suggestion. I verified that this is handled correctly.

agrimsm avatar May 27 '25 20:05 agrimsm

@agrimsm Thanks for reporting this! Are you able to post some snapshots of your results, or potentially the output of client._experiment.to_df() to help us debug internally?

Attaching the output of client._experiment.to_df(). As you can see it does complete trials, but the results are poor (not finding the Pareto front) compared to the v0.5.0 results, and it comes with the warnings I mentioned.

client_experiment_df.csv

Here is also some more stdout output while running the optimization. The INFO statements are mine, the rest is output from Ax. (I've shortened some of the paths in the output.)

2025-05-27 13:38:55,695 - INFO - Got next trial parameters in 94.84 seconds
2025-05-27 13:39:08,129 - INFO - Parameter evaluation completed in 12.43 seconds
2025-05-27 13:39:08,203 - INFO - Completed trial 36 in 107.35 seconds
2025-05-27 13:39:08,203 - INFO - Running trial 37/300
.venv/lib/python3.12/site-packages/botorch/fit.py:215: OptimizationWarning: `scipy_minimize` terminated with status OptimizationStatus.FAILURE, displaying original message from `scipy.optimize.minimize`: ABNORMAL_TERMINATION_IN_LNSRCH
  result = optimizer(mll, closure=closure, **optimizer_kwargs)
.venv/lib/python3.12/site-packages/botorch/fit.py:215: OptimizationWarning: `scipy_minimize` terminated with status OptimizationStatus.FAILURE, displaying original message from `scipy.optimize.minimize`: ABNORMAL_TERMINATION_IN_LNSRCH
  result = optimizer(mll, closure=closure, **optimizer_kwargs)
.venv/lib/python3.12/site-packages/botorch/fit.py:215: OptimizationWarning: `scipy_minimize` terminated with status OptimizationStatus.FAILURE, displaying original message from `scipy.optimize.minimize`: ABNORMAL_TERMINATION_IN_LNSRCH
  result = optimizer(mll, closure=closure, **optimizer_kwargs)
.venv/lib/python3.12/site-packages/botorch/fit.py:215: OptimizationWarning: `scipy_minimize` terminated with status OptimizationStatus.FAILURE, displaying original message from `scipy.optimize.minimize`: ABNORMAL_TERMINATION_IN_LNSRCH
  result = optimizer(mll, closure=closure, **optimizer_kwargs)
.venv/lib/python3.12/site-packages/botorch/fit.py:215: OptimizationWarning: `scipy_minimize` terminated with status OptimizationStatus.FAILURE, displaying original message from `scipy.optimize.minimize`: ABNORMAL_TERMINATION_IN_LNSRCH
  result = optimizer(mll, closure=closure, **optimizer_kwargs)
[WARNING 05-27 13:39:11] ax.models.torch.botorch_modular.surrogate: Model ModelConfig(botorch_model_class=None, model_options={}, mll_class=<class 'gpytorch.mlls.exact_marginal_log_likelihood.ExactMarginalLogLikelihood'>, mll_options={}, input_transform_classes=[<class 'botorch.models.transforms.input.Warp'>, <class 'botorch.models.transforms.input.Normalize'>], input_transform_options={'Normalize': {'center': 0.0}}, outcome_transform_classes=None, outcome_transform_options={}, covar_module_class=<class 'gpytorch.kernels.linear_kernel.LinearKernel'>, covar_module_options={}, likelihood_class=None, likelihood_options={}, name='LinearKernel with Warp') failed to fit with error All attempts to fit the model have failed.. Skipping.
.venv/lib/python3.12/site-packages/botorch/fit.py:215: OptimizationWarning: `scipy_minimize` terminated with status OptimizationStatus.FAILURE, displaying original message from `scipy.optimize.minimize`: ABNORMAL_TERMINATION_IN_LNSRCH
  result = optimizer(mll, closure=closure, **optimizer_kwargs)
.venv/lib/python3.12/site-packages/botorch/fit.py:215: OptimizationWarning: `scipy_minimize` terminated with status OptimizationStatus.FAILURE, displaying original message from `scipy.optimize.minimize`: ABNORMAL_TERMINATION_IN_LNSRCH
  result = optimizer(mll, closure=closure, **optimizer_kwargs)
.venv/lib/python3.12/site-packages/botorch/fit.py:215: OptimizationWarning: `scipy_minimize` terminated with status OptimizationStatus.FAILURE, displaying original message from `scipy.optimize.minimize`: ABNORMAL_TERMINATION_IN_LNSRCH
  result = optimizer(mll, closure=closure, **optimizer_kwargs)
.venv/lib/python3.12/site-packages/botorch/fit.py:215: OptimizationWarning: `scipy_minimize` terminated with status OptimizationStatus.FAILURE, displaying original message from `scipy.optimize.minimize`: ABNORMAL_TERMINATION_IN_LNSRCH
  result = optimizer(mll, closure=closure, **optimizer_kwargs)
.venv/lib/python3.12/site-packages/botorch/fit.py:215: OptimizationWarning: `scipy_minimize` terminated with status OptimizationStatus.FAILURE, displaying original message from `scipy.optimize.minimize`: ABNORMAL_TERMINATION_IN_LNSRCH
  result = optimizer(mll, closure=closure, **optimizer_kwargs)
[WARNING 05-27 13:39:13] ax.models.torch.botorch_modular.surrogate: Model ModelConfig(botorch_model_class=None, model_options={}, mll_class=<class 'gpytorch.mlls.exact_marginal_log_likelihood.ExactMarginalLogLikelihood'>, mll_options={}, input_transform_classes=[<class 'botorch.models.transforms.input.Warp'>, <class 'botorch.models.transforms.input.Normalize'>], input_transform_options={'Normalize': {'center': 0.0}}, outcome_transform_classes=None, outcome_transform_options={}, covar_module_class=<class 'gpytorch.kernels.linear_kernel.LinearKernel'>, covar_module_options={}, likelihood_class=None, likelihood_options={}, name='LinearKernel with Warp') failed to fit with error All attempts to fit the model have failed.. Skipping.
.venv/lib/python3.12/site-packages/linear_operator/utils/cholesky.py:40: NumericalWarning: A not p.d., added jitter of 1.0e-08 to the diagonal
  warnings.warn(
2025-05-27 13:39:59,768 - INFO - Got next trial parameters in 51.57 seconds

agrimsm avatar May 27 '25 20:05 agrimsm

This is super useful! Also please post the results from the Ax 0.5.0 optimization, if you still have them.

Also thank you for posting the logs - this helps illustrate the log spew issue you mentioned earlier.

bernardbeckerman avatar May 27 '25 21:05 bernardbeckerman

Did you see similar degradation in 1.0.0 using method="fast"?

sdaulton avatar May 28 '25 00:05 sdaulton

This is super useful! Also please post the results from the Ax 0.5.0 optimization, if you still have them.

Also thank you for posting the logs - this helps illustrate the log spew issue you mentioned earlier.

I just generated some data with v0.5.0 and some more with v1.0.0. The latter also runs slower so it takes much longer to get data.

client_experiment_df_v0.5.0_new.csv

client_experiment_df_v1.0.0.csv

Did you see similar degradation in 1.0.0 using method="fast"?

"fast" does not lead to the same numerical warnings, but the results are still poor in terms of finding the Pareto front .. or any points that satisfy the constraints really.

If you provide me with a code snippet that allows me to setup an identical optimization strategy as would be used with v0.5.0, I can try that and check that there isn't something else going on.

agrimsm avatar May 28 '25 00:05 agrimsm

Are these results for 1.0.0 using "fast" or balanced? Are you supplying objective thresholds for the 1.0.0 optimization?

sdaulton avatar May 30 '25 21:05 sdaulton

Also, what does your optimization loop look like for generating arms and reporting evaluations with the client?

sdaulton avatar May 30 '25 21:05 sdaulton

Are these results for 1.0.0 using "fast" or balanced? Are you supplying objective thresholds for the 1.0.0 optimization?

The numerical warnings come when running "balanced". I also tried "fast", which does not give warnings, but it also does not find the Pareto front that I am able to find consistently with v0.5.0.

Also, what does your optimization loop look like for generating arms and reporting evaluations with the client?

It's challenging for me to share the actual evaluation function for the objectives. But the outer loop looks like this. For v0.5.0:

    for i in range(total_trials):
        trial_start = time.time()
        logger.info(f"Running trial {i+1}/{total_trials}")

        # Get next trial
        get_trial_start = time.time()
        parameters, trial_index = ax_client.get_next_trial()
        get_trial_time = time.time() - get_trial_start
        logger.info(f"Got next trial parameters in {get_trial_time:.2f} seconds")

        # Evaluate parameters
        eval_start = time.time()
        result, metadata = evaluate(parameters)
        eval_time = time.time() - eval_start
        logger.info(f"Parameter evaluation completed in {eval_time:.2f} seconds")

        # Complete trial
        ax_client.complete_trial(trial_index=trial_index, raw_data=result)
        trial = ax_client.experiment.trials[trial_index]

        for key, val in metadata.items():
            trial.run_metadata[key] = val

        ax_client.save_to_json_file(output_file)

        # Report trial time
        trial_time = time.time() - trial_start
        logger.info(f"Completed trial {i+1} in {trial_time:.2f} seconds")

and for v1.0.0:

    for i in range(total_trials):
        trial_start = time.time()
        logger.info(f"Running trial {i+1}/{total_trials}")

        # Get next trial - updated for 1.0.0 API
        get_trial_start = time.time()
        trials_dict = client.get_next_trials(max_trials=1)
        trial_index, parameters = trials_dict.popitem()
        get_trial_time = time.time() - get_trial_start
        logger.info(f"Got next trial parameters in {get_trial_time:.2f} seconds")

        # Evaluate parameters
        eval_start = time.time()
        result, metadata = evaluate(parameters)
        eval_time = time.time() - eval_start
        logger.info(f"Parameter evaluation completed in {eval_time:.2f} seconds")

        # Complete trial
        client.complete_trial(trial_index=trial_index, raw_data=result)
        
        # Add metadata to trial - using internal _experiment attribute
        for key, val in metadata.items():
            client._experiment.trials[trial_index].run_metadata[key] = val

        client.save_to_json_file(output_file)

        # Report trial time
        trial_time = time.time() - trial_start
        logger.info(f"Completed trial {i+1} in {trial_time:.2f} seconds")

agrimsm avatar Jun 04 '25 21:06 agrimsm

Could someone provide a v1.0.0 implementation using the lower level "custom generators via Modular Botorch interface" that would be identical to my v0.5.0 implementation? Have I provided enough information?

agrimsm avatar Jun 20 '25 22:06 agrimsm

Hi @agrimsm , sorry for the delay here! I think you can try to do this: Client.set_generation_strategy(choose_generation_strategy_legacy(search_space=Client._experiment.search_space)). Let us know how it goes!

lena-kashtelyan avatar Jun 24 '25 22:06 lena-kashtelyan