Ax icon indicating copy to clipboard operation
Ax copied to clipboard

Random vs Acquisition function selected Candidates

Open ramseyissa opened this issue 2 years ago • 50 comments

Hello, I am currently trying to compare two methods. I have about 4000 data points (10 features and 1 target value).

Method 1: I randomly select candidates 10%, 20%, 30%, etc., and trained a model (sklearn regression model) to try to see at what percent is the models performance satisfactory (an R2 value of 90 or above) based on the amount of randomly selected data i give the model.

Method 2: I want to initialize a GP model using Ax, beginning with 10% percent of my data (using this as a sort of SOBOL points to give the model an idea of the design space), and then have it select from the remaining list of candidates what is the next best point to choose based on the Acquisition function (that would give the model the most information about the design space). I then want to attached this candidate and continue through the remaining list of candidates.

Method 2 is where i am having trouble. Not exactly sure how to go about this.

My current Ax experience: I have used Ax previously, to attach my experimental data (ax_client.attach_trial), and then have it complete the trial using ax_client.complete_trial. After that I had it generate what the next best experiment to do using ax_client.get_next_trials.

I have also read through #771 but am having trouble in implementing this.

Thank you in advance!

ramseyissa avatar Mar 28 '23 20:03 ramseyissa

So this sounds like an active learning type problem where your goal is to find the best model across the whole design space ("best" in some yet-to-be specified sense)? This would mean that you'd need to use an acquisition function that specifically attempts to do this - standard acquisition functions for optimization such as Expected Improvement won't do the job here. There is e.g. [qNegIntegratedPosteriorVariance] (https://github.com/pytorch/botorch/blob/main/botorch/acquisition/active_learning.py#L40) that could be used to that end. I am not sure if we have any examples for how to use it in ax, but it should be possible to do this via the modular BoTorch model setup: https://github.com/facebook/Ax/blob/main/tutorials/modular_botax.ipynb

Once that's hooked up, then you should be able to use evaluate_acquisition_function to evaluate the acquisition function on your held-out points to pick the one with the highest value. See also the modular BoTorch model tutorial for how to do this (it's not actually run in code but discussed in the comments there).

Balandat avatar Mar 29 '23 01:03 Balandat

@Balandat Thank you very much for the quick reply!

Yes exactly, this is an active learning problem. My main goal is to try to minimize the amount of data (of the 4000 samples) used to train a model, that would then be able to give me an accurate (say 80-90%) prediction on a validation set and eventually on a held-out test set.

Ok great I will take a look at the references.

So in this case can the surrogate model be any model I choose? Or are there any limitations? Or is the best choice here a GP.

ramseyissa avatar Mar 29 '23 21:03 ramseyissa

So in this case can the surrogate model be any model I choose? Or are there any limitations? Or is the best choice here a GP.

So in general you'll need some kind of model with uncertainty quantification if you want to use qNegIntegratedPosteriorVariance or similar acquisition functions that are based on minimizing global model uncertainty. A GP would be the obvious choice here (and should work out of the box, unlike some other models which you'd have to hook up).

Balandat avatar Mar 29 '23 21:03 Balandat

Ok that makes sense. So to validate this i would just fit and predict using the GP on the validation set iteratively after selecting the next best point (as in the sense that it would be my evaluation function)?

ramseyissa avatar Mar 29 '23 21:03 ramseyissa

yup, that makes sense to me!

Balandat avatar Mar 29 '23 21:03 Balandat

@Balandat Hello again, Thank you very much for the initial advice. I have a semi-working implementation (i say semi for the following reason). I notice when i evaluate the acquisition function qNegIntegratedPosteriorVariance, and print print(best_candidate_idx) i always get an output of tensor(0) for the index. But when i do this with UpperConfidenceBound the indices printed are random (which makes a bit more sense to me). Can you shed some light on this?

for i in range(len(X_candidates)):
    # Fit the GP
    gp = SingleTaskGP(X_train, Y_train)
    mll = ExactMarginalLogLikelihood(gp.likelihood, gp)
    fit_gpytorch_model(mll)

    # acquisition function
    acq_func = qNegIntegratedPosteriorVariance(gp, mc_points=X_candidates)
    # acq_func = UpperConfidenceBound(gp, beta=2.0)
    # acq_values = acq_func(X_candidates.unsqueeze(-2))

    # Evaluate the acquisition function on the candidate set
    acq_values = acq_func(X_candidates)

    # Find the index of the best candidate in the candidate set
    best_candidate_idx = torch.argmax(acq_values)
    print(best_candidate_idx)

    # add the best candidate to the training data
    X_train = torch.cat([X_train, X_candidates[best_candidate_idx].unsqueeze(0)])
    Y_train = torch.cat([Y_train, Y_candidates[best_candidate_idx].unsqueeze(0)])

    # remove the best candidate from candidates
    X_candidates = torch.cat([X_candidates[:best_candidate_idx], X_candidates[best_candidate_idx+1:]])
    Y_candidates = torch.cat([Y_candidates[:best_candidate_idx], Y_candidates[best_candidate_idx+1:]])

    # Predict MAE
    test_posterior = gp.posterior(X_test)
    test_predictions = test_posterior.mean.detach().numpy()
    mae = mean_absolute_error(Y_test, test_predictions)

Thank you in advance!

ramseyissa avatar Apr 01 '23 08:04 ramseyissa

Hmm I think you may be using qNegIntegratedPosteriorVariance incorrectly here. Note that the mc_points arguments takes a (typically relatively dense) set of points across the domain that are used for Monte Carlo integration. But by doing acq_func = qNegIntegratedPosteriorVariance(gp, mc_points=X_candidates) you're passing the candidate set. This results in the acquisition function approximating the integral of the posterior variance by only summing it up on at the candidate points (i.e. incredibly poorly). You should use a different set of points for MC integration instead.

Balandat avatar Apr 01 '23 15:04 Balandat

Ah, that makes sense. I updated my script based on your suggestion. I went ahead and added a subset of the data mc_points = 500. So what it does is select randomly 500 points from the candidates space and uses it as mc_points. The code below is inserted into the forloop under fit_gpytorch_model(mll). Unfortunately, the print statement for the best_index is still printing tensor(0). Was this approach along the right path that you mentioned?

mc_points_size = 500
mc_points_indices = np.random.choice(len(X_candidates), min(mc_points_size, len(X_candidates)), replace=False)
mc_points = X_candidates[mc_points_indices]

Thank you again for taking the time to respond to the numerous questions!

ramseyissa avatar Apr 02 '23 01:04 ramseyissa

I went ahead and added a subset of the data mc_points = 500. So what it does is select randomly 500 points from the candidates space and uses it as mc_points.

Oh so this means that you're ever only going to evaluate the posterior variance on a subset of the data points. The mc_points are meant to be points that have nothing to do with the candidates - they are just a set of points that are used for Monte-Carlo integration. If you take a look at eq. (2) in https://arxiv.org/pdf/1710.03206.pdf, the approach qNegIntegratedPosteriorVariance takes is to simply Monte-Carlo integrate the integrand over the domain. The mc_points are the samples from the domain to do this integration, they should be uniformly sampled from the domain (the more the better for accuracy, ofc subject to compute and memory limits). So if your domain is [0,1]^d, this would be something like torch.rand(num_mc_samples, d).

Balandat avatar Apr 02 '23 02:04 Balandat

Hey @Balandat, thanks for clarifying what the mc_points are suppose to be. I sort of went down the Botorch rabbit hole these last few days and I am starting to get a little more familiar with it 😅 . The first comment you mentioned with using the evaluate_acquisition_function makes a bit more sense now (going to tackle that next). I opted to use a different acquisition function with the snippet of code i posted earlier and just wanted to run this by you.

bounds = torch.tensor([[-5.0, -3.0, -4.0, -6.0, -11.0, -11.0, -4.0, -2.0, -2.0, -2.0],[4.0, 4.0, 4.0, 4.0, 1.0, 1.0, 3.0, 3.0, 3.0,3.0]],dtype=torch.float64)

candidate_set = draw_sobol_samples(bounds=bounds, n=1000, q=1).squeeze(-2)    

candidate_set = bounds[0] + (bounds[1] - bounds[0]) * candidate_set

iteration = []
mae_list = []
for i in range(100):
    # Fit the Gaussian Process model
    gp = SingleTaskGP(X_train, Y_train)
    mll = ExactMarginalLogLikelihood(gp.likelihood, gp)
    fit_gpytorch_model(mll)
    qMES = qMaxValueEntropy(gp,candidate_set)
    acq_values = qMES(X_candidates.unsqueeze(-2))
    best_candidate_idx = torch.argmax(acq_values)
    print(best_candidate_idx)

My concern here was just to make sure this is being used correctly ..i defined the acquisition function and used candidate_set drawn from draw_sobol_samples and then used qMES on the samples i am using for active learning (X_candidates).. acq_values = qMES(X_candidates.unsqueeze(-2))... after that i pull the max best_candidate_idx = torch.argmax(acq_values) and then added this point into the training set. Hope this makes sense!

ramseyissa avatar Apr 09 '23 08:04 ramseyissa

So here you're effectively optimizing the MES acquisition function on a finite set of random candidate points. That's not wrong per se, but one of the big benefits of the BoTorch acquisition functions is that they are differentiale and so you can use gradient-based optimizers to typically get much better solutions, esp. jn high dim spaces. Check out optimize_acqf for how to do this.

Balandat avatar Apr 09 '23 15:04 Balandat

EDIT: Just realized that my assumption is incorrect. It seems like the best way to go is evaluate_acquisition_function as it would allow for evaluating on my actual candidate list. Here optimize_acqf is not looking strictly which value in X_candidates optimizes the acquisition function.

ooh i see! I did some digging based on your last comment and found this notebook you contributed to max_value_entropy. Correct me if i am wrong here, the candidate_set in the notebook, would be my X_candidates that i am looking to evaluate..? From the code below this means the returned candidates and acq_value would be the next best candidate to select and train the gp on. Initially in the code above (in the last comment) i treated the candidate_set as we did the mc_points which maybe was not the right approach.

bounds = torch.tensor([[-5.0, -3.0, -4.0, -6.0, -11.0, -11.0, -4.0, -2.0, -2.0, -2.0],[4.0, 4.0, 4.0, 4.0, 1.0, 1.0, 3.0, 3.0, 3.0,3.0]],dtype=torch.float64)
# candidate_set = bounds[0] + (bounds[1] - bounds[0]) * candidate_set
# candidate_set = draw_sobol_samples(bounds=bounds, n=1000, q=1).squeeze(-2)    
# # candidate_set = torch.rand(1000, bounds.size(1), device=bounds.device, dtype=bounds.dtype)
# candidate_set = bounds[0] + (bounds[1] - bounds[0]) * candidate_set
iteration = []
mae_list = []
for i in range(len(X_candidates)):
    # Fit the Gaussian Process model
    gp = SingleTaskGP(X_train, Y_train)
    mll = ExactMarginalLogLikelihood(gp.likelihood, gp)
    fit_gpytorch_model(mll)
    qMES = qMaxValueEntropy(gp,X_candidates)
    candidates, acq_value = optimize_acqf(
    acq_function=qMES,
    bounds=bounds,
    q=1,
    num_restarts=10,
    raw_samples=512,
)

ramseyissa avatar Apr 10 '23 02:04 ramseyissa

Correct me if i am wrong here, the candidate_set in the notebook, would be my X_candidates that i am looking to evaluate..?

Not quite. The qMaxValueEntropy acquisition function, similar to qNegIntegratedPosteriorVariance also requires a discrete set of points candidate_set in the domain over which to draw samples of the max value (this is just due to how that acquisition function is implemented). Bur this is (maybe confusingly) not the same as X_candidate which would be the set of points from which you consider to pick one to evaluate next. In this case you can. Still optimize the acquisition function (for a given candidate_set numerically without having to pre specify a set of discrete points from which to choose.

i treated the candidate_set as we did the mc_points which maybe was not the right approach.

that is the right approach.

Balandat avatar Apr 12 '23 12:04 Balandat

Hey @Balandat, its been a little while, but i reevaluated my approach a bit (hopefully this is closer to what you suggested early on). See sample code please.

I have a few questions here:

  1. In this use case, do i need to optimize the acquisition function as in the previous use case (in the above example that we previously spoken about)? I ask because oddly enough my mean absolute error fluctuated a bit and did not decrease as expected (so i assume i am making a mistake somewhere).

  2. One issue i encountered was when i was trying to swap out qMaxValueEntropy for qNegIntegratedPosteriorVariance. For some reason a direct swap in this code raised an error. Here is the error raised.

RuntimeError: Input constructor for acquisition class qNegIntegratedPosteriorVariance not registered. Use the @acqf_input_constructor` decorator to register a new method.

Can this acquisition function be added to evaluate_acquisition_function so that it is compatible? (maybe i should be opening up a separate issue for this).

Here is the sample code:

gs = GenerationStrategy(
    steps=[
        GenerationStep(
            model=Models.BOTORCH_MODULAR, 
            num_trials=-1,  
            max_parallelism=3, 
            model_kwargs={"surrogate": Surrogate(SingleTaskGP),
                          "botorch_acqf_class": qMaxValueEntropy,
                }
        ),
    ]
)


test_set_dict = [ObservationFeatures(x) for x in X_test.to_dict(orient="records")]


ax_client = AxClient(generation_strategy=gs)
ax_client.create_experiment(
    name="Peak_Prediction", 
    parameters=parameters, 
    objective_name="ppm",
    # minimize=False, 
    )

# Attach and complete trials for the initial training set
for j in range(len(X_train)):
    _, trial_index = ax_client.attach_trial(X_train.iloc[j, :].to_dict())
    ax_client.complete_trial(
        trial_index=trial_index, raw_data=Y_train.iloc[j].to_dict()
    )
print("fitted Training data")
    
iteration=[]
mae_list = []
trials = 0
len_X_candidates = len(X_candidates)
for i in range(len_X_candidates):
    ax_client.fit_model()

    model_bridge = ax_client.generation_strategy.model

    observation_features = [
    ObservationFeatures(x) for x in X_candidates.to_dict(orient="records")
]
    
    acqf_values = model_bridge.evaluate_acquisition_function(
        observation_features=observation_features)

    max_value = max(acqf_values)
    max_index = acqf_values.index(max_value)
    next_experiment = X_candidates.iloc[max_index]
    Y_candidate_value = Y_candidates.iloc[max_index]
    

    # Add the features with the max value to X_train and remove from X_candidates
    X_train = pd.concat([X_train, next_experiment.to_frame().T], ignore_index=True)
    X_candidates = X_candidates.drop(X_candidates.index[max_index], axis=0)
  
    # Attach and complete the next trial
    _, trial_index = ax_client.attach_trial(next_experiment.to_dict())
    ax_client.complete_trial(
        trial_index=trial_index, raw_data=Y_candidate_value.to_dict()
    )
    trials = trials + 1
    model = ax_client.generation_strategy.model
    y_predicted = model.predict(test_set_dict)
    predicted_y_values = y_predicted[0]['ppm']
    mae = mean_absolute_error(Y_test, predicted_y_values)
    i = i + 1
    iteration.append(i)
    mae_list.append(mae)

Thank you for your time!

ramseyissa avatar Apr 29 '23 00:04 ramseyissa

@ramseyissa, maybe the following will be of help?

Note: copied from https://github.com/facebook/Ax/issues/1312

Here is a self-contained example using vanilla single-objective optimization based on https://github.com/facebook/Ax/issues/460#issuecomment-975233349:

from typing import Any, Dict, Optional

import torch

# from ax.core.objective import ScalarizedObjective
from ax.modelbridge import get_sobol
from ax.modelbridge.generation_strategy import GenerationStep, GenerationStrategy
from ax.modelbridge.registry import Models
from ax.models.torch.botorch_modular.surrogate import Surrogate
from ax.service.ax_client import AxClient
from botorch.acquisition.active_learning import (
    MCSampler,
    qNegIntegratedPosteriorVariance,
)
from botorch.acquisition.input_constructors import (
    MaybeDict,
    acqf_input_constructor,
    construct_inputs_mc_base,
)
from botorch.acquisition.objective import AcquisitionObjective
from botorch.models.gp_regression import SingleTaskGP
from botorch.models.model import Model
from botorch.utils.datasets import SupervisedDataset
from torch import Tensor


@acqf_input_constructor(qNegIntegratedPosteriorVariance)
def construct_inputs_qNIPV(
    model: Model,
    mc_points: Tensor,
    training_data: MaybeDict[SupervisedDataset],
    objective: Optional[AcquisitionObjective] = None,
    X_pending: Optional[Tensor] = None,
    sampler: Optional[MCSampler] = None,
    **kwargs: Any,
) -> Dict[str, Any]:

    if model.num_outputs == 1:
        objective = None

    base_inputs = construct_inputs_mc_base(
        model=model,
        training_data=training_data,
        sampler=sampler,
        X_pending=X_pending,
        objective=objective,
    )

    return {**base_inputs, "mc_points": mc_points}


def objective_function(x):
    f = x["x1"] ** 2 + x["x2"] ** 2 + x["x3"] ** 2
    return {"f": (f, None)}


parameters = [
    {"name": "x1", "type": "range", "bounds": [0.0, 1.0], "value_type": "float"},
    {"name": "x2", "type": "range", "bounds": [0.0, 1.0], "value_type": "float"},
    {"name": "x3", "type": "range", "bounds": [0.0, 1.0], "value_type": "float"},
]
ax_client_tmp = AxClient()
ax_client_tmp.create_experiment(parameters=parameters)
sobol = get_sobol(ax_client_tmp.experiment.search_space)
mc_points = sobol.gen(1024).param_df.values
mcp = torch.tensor(mc_points)

model_kwargs_val = {
    "surrogate": Surrogate(SingleTaskGP),
    "botorch_acqf_class": qNegIntegratedPosteriorVariance,
    "acquisition_options": {"mc_points": mcp},
}

gs = GenerationStrategy(
    steps=[
        GenerationStep(model=Models.SOBOL, num_trials=num_sobol),
        GenerationStep(
            model=Models.BOTORCH_MODULAR, num_trials=num_qnipv, model_kwargs=model_kwargs_val
        ),
    ]
)

ax_client = AxClient(generation_strategy=gs)
ax_client.create_experiment(
    name="active_learning_experiment",
    parameters=parameters,
    objective_name="f",
    minimize=True,
)

for _ in range(20):
    trial_params, trial_index = ax_client.get_next_trial()
    data = objective_function(trial_params)
    ax_client.complete_trial(trial_index=trial_index, raw_data=data["f"])

sgbaird avatar Apr 29 '23 17:04 sgbaird

@sgbaird, thank you very much, this was very helpful! I managed to tweak this a bit for my use case, and swapped out the qNIPV acquisition function for qMES using its own input constructor. I have a few question that you might be able to address here and if @Balandat has some input on this that would also be appreciated.

Ok so the question is, when calling evaluate_acquisition_function(observation_features=observation_features), observation features here is a dictionary of my candidate list that i am looking to pick the next sample from, do i need to call optimize_acqf function at some point before using evaluate_acquisition_function or is this automatically handled here?

Thank you both for your time!

This is the input constructor used for qMES for reference:

@acqf_input_constructor(qMaxValueEntropy)
def construct_inputs_qMES(
    model: Model,
    training_data: MaybeDict[SupervisedDataset],
    bounds: List[Tuple[float, float]],
    objective: Optional[MCAcquisitionObjective] = None,
    # posterior_transform: Optional[PosteriorTransform] = None,
    candidate_size: int = 1000,
    **kwargs: Any,
) -> Dict[str, Any]:
    r"""Construct kwargs for `qMaxValueEntropy` constructor."""
    inputs_mc = construct_inputs_mc_base(
        model=model,
        training_data=training_data,
        objective=objective,
        **kwargs,
    )

    X = _get_dataset_field(training_data, "X", first_only=True)
    _kw = {"device": X.device, "dtype": X.dtype}
    _rvs = torch.rand(candidate_size, len(bounds), **_kw)
    _bounds = torch.tensor(bounds, **_kw).transpose(0, 1)
    return {
        **inputs_mc,
        "candidate_set": _bounds[0] + (_bounds[1] - _bounds[0]) * _rvs,
        "maximize": kwargs.get("maximize", True),
    }

ramseyissa avatar May 15 '23 18:05 ramseyissa

I can't really comment on the accuracy of the construct_inputs_qMES function (other than, if it doesn't throw errors, that's a good sign 😉) . I don't think you need to deal with optimize_acqf if you're using evaluate_acquisition_function.

sgbaird avatar May 15 '23 18:05 sgbaird

Indeed, you're evaluating the acquisition function on a discrete set here using evaluate_acquisition_function, so you can just pick the best and not use optimize_acqf.

Balandat avatar May 20 '23 22:05 Balandat

Recapping some internal discussion, @ramseyissa is predicting peak positions for an NMR spectra, so a single set of inputs typically has multiple outputs. Depending on the number of new atoms added to each simulation, there could be anywhere from 1 to 16 peaks. I think the possible locations of the peak are on the order of 100's, if not continuous, and 16 outputs might be too much in terms of modeling output behavior.

One option would be to train a VAE to move a spectrum object back and forth from a low-dimensional representation (e.g., ~4 values) and a high-dimensional version (16 values). The ppm values could be sorted to reduce degeneracies or data augmentation could be applied (swapping values).

https://github.com/facebook/Ax/issues/1312#issue-1488742013

Then, I think ScalarizedPosteriorTransform gets used to compute a sort of joint acquisition function. Maybe this can be specified in the part of the generation strategy that gets passed more directly to BoTorch)

EDIT: @ramseyissa, also, I'm not sure if Max-value Entropy Search (MES) is the right model for your application since it still seems to be focused on minimizing or maximizing some output property that you're modeling rather than minimizing the global uncertainty, irrespective of whether the property is low or high. From https://botorch.org/tutorials/max_value_entropy:

Max-value entropy search (MES) acquisition function quantifies the information gain about the maximum of a black-box function

emphasis added

sgbaird avatar May 29 '23 21:05 sgbaird

@Balandat @sgbaird :

I have been trying to use qNIPV in the same way i used MES, but i keep getting an error that isn't making much sense to me. In my problem just to remind you, i am not looking to optimize my objective value, i am just looking to minimize my global uncertainty for a predefined list of candidates.

It seems that when using the construct_inputs_qNIPV it is not recognizing mc_points that is leading to the error (the error is seen at the bottom). Any suggestions on how to address this?

@acqf_input_constructor(qNegIntegratedPosteriorVariance)
def construct_inputs_qNIPV(
    model: Model,
    mc_points: Tensor,
    training_data: MaybeDict[SupervisedDataset],
    objective: Optional[AcquisitionObjective] = None,
    X_pending: Optional[Tensor] = None,
    sampler: Optional[MCSampler] = None,
    **kwargs: Any,
) -> Dict[str, Any]:

    if model.num_outputs == 1:
        objective = None

    base_inputs = construct_inputs_mc_base(
        model=model,
        training_data=training_data,
        sampler=sampler,
        X_pending=X_pending,
        objective=objective,
    )

    return {**base_inputs, "mc_points": mc_points}


# the objective function here is just predicting on a test set and calculating the mae
# with the model that is updated through an active learning loop
def objective_function(model, test_set_dict, test_set_target_vales):
    # get gp 
    model = ax_client.generation_strategy.model

    # predict on test set
    y_predicted = model.predict(test_set_dict)
    
    # get predicted values
    predicted_y_values = y_predicted[0]['ppm']
    
    # calculate mae
    mae = mean_absolute_error(test_set_target_vals, predicted_y_values)
    print(f'MAE: {mae}')
    return {"mae": (mae, None)}
    
init_len_of_x_train = len(train_dfs)

parameters = [
    {"name": vals, "type": "range", "bounds": [-15.0, 15.0], "value_type": "float"}
    for vals in descriptors_df.columns]

# this is used to get the mc_points for the acquisition function
ax_client_tmp = AxClient()
ax_client_tmp.create_experiment(parameters=parameters)
sobol = get_sobol(ax_client_tmp.experiment.search_space)
mc_points = sobol.gen(1024).param_df.values
mcp = torch.tensor(mc_points)


# dict of kwargs for BoTorch modular in generation strategy 
model_kwargs_val = {
    "surrogate": Surrogate(SingleTaskGP),
    "botorch_acqf_class": qNegIntegratedPosteriorVariance,
    "acquisition_options": {"mc_points": mcp},
}


gs = GenerationStrategy(
    steps=[
        # GenerationStep(model=Models.SOBOL, num_trials=5),
        GenerationStep(
            model=Models.BOTORCH_MODULAR, num_trials=-1
            , model_kwargs=model_kwargs_val
        ),
    ]
)


ax_client = AxClient(generation_strategy=gs)
ax_client.create_experiment(
    name="active_learning_experiment",
    parameters=parameters,
    objective_name="ppm",
    # minimize=True,
)

# test set 
test_set_dict = [ObservationFeatures(x) for df in test_dfs for x in df.to_dict(orient="records")]

#get test set target values 
test_set_target_vals = get_target_values(test_dfs, ppm_df, 'ppm')

#get train set target values
train_set_target_vals = get_target_values(train_dfs, ppm_df, 'ppm')

    
# my training data is a list of dataframes, so I need to attach each dataframe to ax_client
for df in train_dfs:
    for idx in range(len(df)):
        _, trial_index = ax_client.attach_trial(df.iloc[idx, :].to_dict())
        ax_client.complete_trial(
        trial_index=trial_index, raw_data=train_set_target_vals.iloc[idx].to_dict()
    )


    
index_of_candidate_set = []
iteration=[]
mae_list = []
trials = 0
len_X_candidates = len(x_candidates)
for i in range(len_X_candidates):
    print("size of X_candidates",len(x_candidates))
    print("size of X_train",len(train_dfs))
    
    # fit gp
    ax_client.fit_model()

    # get gp 
    model_bridge = ax_client.generation_strategy.model

    acqf_values = sum_acqf_values(x_candidates, model_bridge)
    
    #get max of acqf values
    max_value = max(acqf_values)
    
    #get index of max value
    max_index = acqf_values.index(max_value)
    
    # locate features in orginal candidate set
    next_experiment = x_candidates[max_index]
    
    next_exp_target_vals = get_target_values(next_experiment, ppm_df, 'ppm')
    
    # add the best candidate selected by acquisition fx experiment to training set
    train_dfs.append(next_experiment)
    
    #drop the best candidate selected by acquisition fx experiment from candidate set
    x_candidates.pop(max_index)
    
    # attach this new trail to ax_client, and complete it with the target value "ppm"
    for idx in range(len(next_experiment)):
        _, trial_index = ax_client.attach_trial(next_experiment.iloc[idx, :].to_dict())
        ax_client.complete_trial(
        trial_index=trial_index, raw_data=next_exp_target_vals.iloc[idx].to_dict())
    

    trials = trials + 1
     
    model = ax_client.generation_strategy.model

    mae_vals = objective_function(model, test_set_dict, test_set_target_vals)

    i = i + 1
    iteration.append(i)
    print(f"Iteration {i+1}: MAE = {mae_vals:.4f}")
    mae_list.append(mae_vals)
    index_of_candidate_set.append(max_index)


exp_df = exp_to_df(ax_client.experiment)

num_of_zeros = len(train_dfs)

exp_df['MAE'] = [0] * init_len_of_x_train + mae_list
# exp_df

The resulting error from the above script is:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[31], line 193
    190     # get gp 
    191     model_bridge = ax_client.generation_strategy.model
--> 193     acqf_values = sum_acqf_values(x_candidates, model_bridge)
    194     # print("len of acqf_values",len(acqf_values))
    195 
    196     # turn list of candidates into dict of observation features
   (...)
    205     
    206 #     #get max of acqf values
    207     max_value = max(acqf_values)

Cell In[16], line 5, in sum_acqf_values(list_of_dfs, model_bridge)
      3 for df in list_of_dfs:
      4     observation_features = [ObservationFeatures(x) for x in df.to_dict(orient="records")]
----> 5     acqf_values = model_bridge.evaluate_acquisition_function(observation_features=observation_features)
      6     # print('acqf values', acqf_values)
      7     sum_acqf_values_list.append(sum(acqf_values))

File [/opt/miniconda3/envs/nmrpeaks/lib/python3.10/site-packages/ax/modelbridge/torch.py:434](https://file+.vscode-resource.vscode-cdn.net/opt/miniconda3/envs/nmrpeaks/lib/python3.10/site-packages/ax/modelbridge/torch.py:434), in TorchModelBridge.evaluate_acquisition_function(self, observation_features, search_space, optimization_config, pending_observations, fixed_features, acq_options)
    425         obs_feats[i] = t.transform_observation_features(batch)
    427 base_gen_args = self._get_transformed_gen_args(
    428     search_space=search_space,
...
    305 )
    306 self.acqf = botorch_acqf_class(**acqf_inputs)  # pyre-ignore [45]
    307 self.X_pending: Optional[Tensor] = unique_Xs_pending

TypeError: construct_inputs_qNIPV() missing 1 required positional argument: 'mc_points'

ramseyissa avatar Jun 26 '23 03:06 ramseyissa

@ramseyissa, can you provide a reproducer? i.e., something that can be copy-pasted and run standalone (preferably a link to a Google Colab notebook 😉)

sgbaird avatar Jun 26 '23 22:06 sgbaird

I have a MWE example for qNIPV, though without anything special. Just 3 float inputs and 1 float output. If you can use this script as a template and gradually adapt it to your use-case, you should be able to pinpoint the discrepancy (and possibly resolve the error you were running into).

EDIT: this needs to be updated to account for transformations per Max's comment https://github.com/facebook/Ax/issues/1557#issuecomment-1624582689

sgbaird avatar Jun 27 '23 05:06 sgbaird

@sgbaird,

I went through the MWE example for qNIPV that was referenced above and it was very helpful. I believe it is the same working example you referenced in https://github.com/facebook/Ax/issues/1557#issuecomment-1528840459 earlier on.

Ok so I think I found the issue. For some reason when you call model_bridge.evaluate_acquisition_function you must use the acq_options argument when using the qNIPV acquisition function, even if specifying acquisition_options in model_kwargs within GenerationStrategy, which differs from when i used the MES acquisition function (which also required MC points).

So I think all finally looks good, but maybe a final check from @Balandat would seal the deal (I can provide the reproducer @sgbaird was after :wink:)

ramseyissa avatar Jul 06 '23 19:07 ramseyissa

Ok so I think I found the issue. For some reason when you call model_bridge.evaluate_acquisition_function you must use the acq_options argument when using the qNIPV acquisition function, even if specifying acquisition_options in model_kwargs within GenerationStrategy, which differs from when i used the MES acquisition function (which also required MC points).

Hmm this is interesting. I didn't go into the weeds on this but I believe this might be due to the fact that qMaxValueEntropy has an input constructor that generating a set of candidate_size candidate_points. The difference to the acqf constructor for qNIPV used here is that candidate_size has a default value, so that gets used if no candidate_size is given in the acquisition_options : https://github.com/pytorch/botorch/blob/6f9818756b6a3f83ad52aa3a6cd7b2591bd3ce0e/botorch/acquisition/input_constructors.py#L865

Gotcha alert: One thing to note here is that the BoTorch model when wrapped in Ax operates in the transformed space (typically the unit cube if all parameters are continuous parameters). So mc_points here will need to be passed in that transformed space rather than the original space over which the parameters are defined. Otherwise you'll get some very funky results (e.g. say your parameters are specified on [1000, 2000]^d, and you choose your mc_points uniformly over the same set, then in the transformed space the model will operate on [0, 1]^d and so most if not all of the mc_points will not actually lie in what corresponds to the search space - in this case "very funky" would mean total garbage). Unfortunately it's at least at this point not straightforward to apply the Ax transforms automatically to the mc_points. Hopefully we can come up with a better solution for this in the future. cc @saitcakmak, @dme65

Balandat avatar Jul 07 '23 03:07 Balandat

@Balandat would it be easier to scale to [0, 1]^d outside of Ax using sklearn's MinMaxScaler? Or, assuming UnitX is the only transform at play, would you recommend using UnitX.transform_observation_features?

sgbaird avatar Jul 07 '23 23:07 sgbaird

@Balandat would it be easier to scale to [0, 1]^d outside of Ax using sklearn's MinMaxScaler? Or, assuming UnitX is the only transform at play, would you recommend using UnitX.transform_observation_features?

Either way should work. But yeah, scaling the data before passing it to Ax is the right thing to do here if you want to pass in the mc points (and assuming you have only continuous parameters).

Balandat avatar Jul 07 '23 23:07 Balandat

@Balandat does this look OK?

# MC Points
# WARNING: assumes only UnitX transform, https://ax.dev/docs/models.html#transforms
num_mc_sobol = 2**13
sobol = get_sobol(ax_client_tmp.experiment.search_space)
# mc_points = sobol.gen(1024).param_df.values
obs_features = [
    ObservationFeatures(parameters)
    for parameters in sobol.gen(num_mc_sobol).param_df.to_dict("records")
]
ux = UnitX(ax_client_tmp.experiment.search_space)
obs_features_ux = ux.transform_observation_features(obs_features)
mc_points = [list(obs.parameters.values()) for obs in obs_features_ux]
mcp = torch.tensor(mc_points)
mcp

sgbaird avatar Jul 08 '23 10:07 sgbaird

Hmm yeah that seems like it should work! I guess since UnitX transforms to the unit cube it should be enough to just generate the points directly without Ax involvement:

mcp = torch.quasirandom.SobolEngine(num_params, scramble=True).draw_base2(13)

Again this assumes that there aren't any more complex things going on such as choice parameters or the like..

Balandat avatar Jul 08 '23 18:07 Balandat

@Balandat, thanks for the insight.

Me and @sgbaird have had some internal discussion regarding some of the items discussed above. The result of this are some questions i need clarification on.

  1. In the reply below, when you mentioned "scaling the data before passing it to Ax is the right thing to do here.." did you also mean that the data that we are working with should be scaled (not just the mc points) but the training, x_candidates, and test set? I mean this as in a preprocessing step before beginning an Ax optimization strategy. Also, just a side note here, from what i understand Ax typically scales the data that is passed to it internally?

@Balandat would it be easier to scale to [0, 1]^d outside of Ax using sklearn's MinMaxScaler? Or, assuming UnitX is the only transform at play, would you recommend using UnitX.transform_observation_features?

Either way should work. But yeah, scaling the data before passing it to Ax is the right thing to do here if you want to pass in the mc points (and assuming you have only continuous parameters).

  1. Another clarification, when looking at a list of points that have been evaluated using evaluate_acquisition_function (when using the qNIPV acquisition function), should i be selecting the most negative point in the list, so the min value since it would be the most negative?

ramseyissa avatar Jul 10 '23 20:07 ramseyissa

  1. did you also mean that the data that we are working with should be scaled (not just the mc points) but the training, x_candidates, and test set?

Pretty sure the search space (i.e., parameters and related kwargs) can be left untouched as long as we scale mcp appropriately (i.e., the only change that needs to occur is in transforming mcp).

2. should i be selecting the most negative point in the list

My Intuition is to always take the maximum of the acquisition function, since acq fns are used to "... evaluate the usefulness of one of more design points ...". I'll defer to @Balandat though to clarify

sgbaird avatar Jul 10 '23 21:07 sgbaird