Ax
Ax copied to clipboard
Random vs Acquisition function selected Candidates
Hello, I am currently trying to compare two methods. I have about 4000 data points (10 features and 1 target value).
Method 1: I randomly select candidates 10%, 20%, 30%, etc., and trained a model (sklearn regression model) to try to see at what percent is the models performance satisfactory (an R2 value of 90 or above) based on the amount of randomly selected data i give the model.
Method 2: I want to initialize a GP model using Ax, beginning with 10% percent of my data (using this as a sort of SOBOL points to give the model an idea of the design space), and then have it select from the remaining list of candidates what is the next best point to choose based on the Acquisition function (that would give the model the most information about the design space). I then want to attached this candidate and continue through the remaining list of candidates.
Method 2 is where i am having trouble. Not exactly sure how to go about this.
My current Ax experience: I have used Ax previously, to attach my experimental data (ax_client.attach_trial), and then have it complete the trial using ax_client.complete_trial. After that I had it generate what the next best experiment to do using ax_client.get_next_trials.
I have also read through #771 but am having trouble in implementing this.
Thank you in advance!
So this sounds like an active learning type problem where your goal is to find the best model across the whole design space ("best" in some yet-to-be specified sense)? This would mean that you'd need to use an acquisition function that specifically attempts to do this - standard acquisition functions for optimization such as Expected Improvement won't do the job here. There is e.g. [qNegIntegratedPosteriorVariance] (https://github.com/pytorch/botorch/blob/main/botorch/acquisition/active_learning.py#L40) that could be used to that end. I am not sure if we have any examples for how to use it in ax, but it should be possible to do this via the modular BoTorch model setup: https://github.com/facebook/Ax/blob/main/tutorials/modular_botax.ipynb
Once that's hooked up, then you should be able to use evaluate_acquisition_function to evaluate the acquisition function on your held-out points to pick the one with the highest value. See also the modular BoTorch model tutorial for how to do this (it's not actually run in code but discussed in the comments there).
@Balandat Thank you very much for the quick reply!
Yes exactly, this is an active learning problem. My main goal is to try to minimize the amount of data (of the 4000 samples) used to train a model, that would then be able to give me an accurate (say 80-90%) prediction on a validation set and eventually on a held-out test set.
Ok great I will take a look at the references.
So in this case can the surrogate model be any model I choose? Or are there any limitations? Or is the best choice here a GP.
So in this case can the surrogate model be any model I choose? Or are there any limitations? Or is the best choice here a GP.
So in general you'll need some kind of model with uncertainty quantification if you want to use qNegIntegratedPosteriorVariance or similar acquisition functions that are based on minimizing global model uncertainty. A GP would be the obvious choice here (and should work out of the box, unlike some other models which you'd have to hook up).
Ok that makes sense. So to validate this i would just fit and predict using the GP on the validation set iteratively after selecting the next best point (as in the sense that it would be my evaluation function)?
yup, that makes sense to me!
@Balandat Hello again,
Thank you very much for the initial advice. I have a semi-working implementation (i say semi for the following reason).
I notice when i evaluate the acquisition function qNegIntegratedPosteriorVariance, and print print(best_candidate_idx) i always get an output of tensor(0) for the index. But when i do this with UpperConfidenceBound the indices printed are random (which makes a bit more sense to me). Can you shed some light on this?
for i in range(len(X_candidates)):
# Fit the GP
gp = SingleTaskGP(X_train, Y_train)
mll = ExactMarginalLogLikelihood(gp.likelihood, gp)
fit_gpytorch_model(mll)
# acquisition function
acq_func = qNegIntegratedPosteriorVariance(gp, mc_points=X_candidates)
# acq_func = UpperConfidenceBound(gp, beta=2.0)
# acq_values = acq_func(X_candidates.unsqueeze(-2))
# Evaluate the acquisition function on the candidate set
acq_values = acq_func(X_candidates)
# Find the index of the best candidate in the candidate set
best_candidate_idx = torch.argmax(acq_values)
print(best_candidate_idx)
# add the best candidate to the training data
X_train = torch.cat([X_train, X_candidates[best_candidate_idx].unsqueeze(0)])
Y_train = torch.cat([Y_train, Y_candidates[best_candidate_idx].unsqueeze(0)])
# remove the best candidate from candidates
X_candidates = torch.cat([X_candidates[:best_candidate_idx], X_candidates[best_candidate_idx+1:]])
Y_candidates = torch.cat([Y_candidates[:best_candidate_idx], Y_candidates[best_candidate_idx+1:]])
# Predict MAE
test_posterior = gp.posterior(X_test)
test_predictions = test_posterior.mean.detach().numpy()
mae = mean_absolute_error(Y_test, test_predictions)
Thank you in advance!
Hmm I think you may be using qNegIntegratedPosteriorVariance incorrectly here. Note that the mc_points arguments takes a (typically relatively dense) set of points across the domain that are used for Monte Carlo integration. But by doing acq_func = qNegIntegratedPosteriorVariance(gp, mc_points=X_candidates) you're passing the candidate set. This results in the acquisition function approximating the integral of the posterior variance by only summing it up on at the candidate points (i.e. incredibly poorly). You should use a different set of points for MC integration instead.
Ah, that makes sense.
I updated my script based on your suggestion. I went ahead and added a subset of the data mc_points = 500. So what it does is select randomly 500 points from the candidates space and uses it as mc_points. The code below is inserted into the forloop under fit_gpytorch_model(mll). Unfortunately, the print statement for the best_index is still printing tensor(0). Was this approach along the right path that you mentioned?
mc_points_size = 500
mc_points_indices = np.random.choice(len(X_candidates), min(mc_points_size, len(X_candidates)), replace=False)
mc_points = X_candidates[mc_points_indices]
Thank you again for taking the time to respond to the numerous questions!
I went ahead and added a subset of the data mc_points = 500. So what it does is select randomly 500 points from the candidates space and uses it as mc_points.
Oh so this means that you're ever only going to evaluate the posterior variance on a subset of the data points. The mc_points are meant to be points that have nothing to do with the candidates - they are just a set of points that are used for Monte-Carlo integration. If you take a look at eq. (2) in https://arxiv.org/pdf/1710.03206.pdf, the approach qNegIntegratedPosteriorVariance takes is to simply Monte-Carlo integrate the integrand over the domain. The mc_points are the samples from the domain to do this integration, they should be uniformly sampled from the domain (the more the better for accuracy, ofc subject to compute and memory limits). So if your domain is [0,1]^d, this would be something like torch.rand(num_mc_samples, d).
Hey @Balandat, thanks for clarifying what the mc_points are suppose to be. I sort of went down the Botorch rabbit hole these last few days and I am starting to get a little more familiar with it 😅 . The first comment you mentioned with using the evaluate_acquisition_function makes a bit more sense now (going to tackle that next). I opted to use a different acquisition function with the snippet of code i posted earlier and just wanted to run this by you.
bounds = torch.tensor([[-5.0, -3.0, -4.0, -6.0, -11.0, -11.0, -4.0, -2.0, -2.0, -2.0],[4.0, 4.0, 4.0, 4.0, 1.0, 1.0, 3.0, 3.0, 3.0,3.0]],dtype=torch.float64)
candidate_set = draw_sobol_samples(bounds=bounds, n=1000, q=1).squeeze(-2)
candidate_set = bounds[0] + (bounds[1] - bounds[0]) * candidate_set
iteration = []
mae_list = []
for i in range(100):
# Fit the Gaussian Process model
gp = SingleTaskGP(X_train, Y_train)
mll = ExactMarginalLogLikelihood(gp.likelihood, gp)
fit_gpytorch_model(mll)
qMES = qMaxValueEntropy(gp,candidate_set)
acq_values = qMES(X_candidates.unsqueeze(-2))
best_candidate_idx = torch.argmax(acq_values)
print(best_candidate_idx)
My concern here was just to make sure this is being used correctly ..i defined the acquisition function and used candidate_set drawn from draw_sobol_samples and then used qMES on the samples i am using for active learning (X_candidates).. acq_values = qMES(X_candidates.unsqueeze(-2))... after that i pull the max
best_candidate_idx = torch.argmax(acq_values) and then added this point into the training set.
Hope this makes sense!
So here you're effectively optimizing the MES acquisition function on a finite set of random candidate points. That's not wrong per se, but one of the big benefits of the BoTorch acquisition functions is that they are differentiale and so you can use gradient-based optimizers to typically get much better solutions, esp. jn high dim spaces. Check out optimize_acqf for how to do this.
EDIT: Just realized that my assumption is incorrect. It seems like the best way to go is evaluate_acquisition_function as it would allow for evaluating on my actual candidate list. Here optimize_acqf is not looking strictly which value in X_candidates optimizes the acquisition function.
ooh i see! I did some digging based on your last comment and found this notebook you contributed to max_value_entropy. Correct me if i am wrong here, the candidate_set in the notebook, would be my X_candidates that i am looking to evaluate..? From the code below this means the returned candidates and acq_value would be the next best candidate to select and train the gp on. Initially in the code above (in the last comment) i treated the candidate_set as we did the mc_points which maybe was not the right approach.
bounds = torch.tensor([[-5.0, -3.0, -4.0, -6.0, -11.0, -11.0, -4.0, -2.0, -2.0, -2.0],[4.0, 4.0, 4.0, 4.0, 1.0, 1.0, 3.0, 3.0, 3.0,3.0]],dtype=torch.float64)
# candidate_set = bounds[0] + (bounds[1] - bounds[0]) * candidate_set
# candidate_set = draw_sobol_samples(bounds=bounds, n=1000, q=1).squeeze(-2)
# # candidate_set = torch.rand(1000, bounds.size(1), device=bounds.device, dtype=bounds.dtype)
# candidate_set = bounds[0] + (bounds[1] - bounds[0]) * candidate_set
iteration = []
mae_list = []
for i in range(len(X_candidates)):
# Fit the Gaussian Process model
gp = SingleTaskGP(X_train, Y_train)
mll = ExactMarginalLogLikelihood(gp.likelihood, gp)
fit_gpytorch_model(mll)
qMES = qMaxValueEntropy(gp,X_candidates)
candidates, acq_value = optimize_acqf(
acq_function=qMES,
bounds=bounds,
q=1,
num_restarts=10,
raw_samples=512,
)
Correct me if i am wrong here, the candidate_set in the notebook, would be my X_candidates that i am looking to evaluate..?
Not quite. The qMaxValueEntropy acquisition function, similar to qNegIntegratedPosteriorVariance also requires a discrete set of points candidate_set in the domain over which to draw samples of the max value (this is just due to how that acquisition function is implemented). Bur this is (maybe confusingly) not the same as X_candidate which would be the set of points from which you consider to pick one to evaluate next. In this case you can. Still optimize the acquisition function (for a given candidate_set numerically without having to pre specify a set of discrete points from which to choose.
i treated the candidate_set as we did the mc_points which maybe was not the right approach.
that is the right approach.
Hey @Balandat, its been a little while, but i reevaluated my approach a bit (hopefully this is closer to what you suggested early on). See sample code please.
I have a few questions here:
-
In this use case, do i need to optimize the acquisition function as in the previous use case (in the above example that we previously spoken about)? I ask because oddly enough my mean absolute error fluctuated a bit and did not decrease as expected (so i assume i am making a mistake somewhere).
-
One issue i encountered was when i was trying to swap out
qMaxValueEntropyforqNegIntegratedPosteriorVariance. For some reason a direct swap in this code raised an error. Here is the error raised.
RuntimeError: Input constructor for acquisition class qNegIntegratedPosteriorVariance not registered. Use the @acqf_input_constructor` decorator to register a new method.
Can this acquisition function be added to evaluate_acquisition_function so that it is compatible? (maybe i should be opening up a separate issue for this).
Here is the sample code:
gs = GenerationStrategy(
steps=[
GenerationStep(
model=Models.BOTORCH_MODULAR,
num_trials=-1,
max_parallelism=3,
model_kwargs={"surrogate": Surrogate(SingleTaskGP),
"botorch_acqf_class": qMaxValueEntropy,
}
),
]
)
test_set_dict = [ObservationFeatures(x) for x in X_test.to_dict(orient="records")]
ax_client = AxClient(generation_strategy=gs)
ax_client.create_experiment(
name="Peak_Prediction",
parameters=parameters,
objective_name="ppm",
# minimize=False,
)
# Attach and complete trials for the initial training set
for j in range(len(X_train)):
_, trial_index = ax_client.attach_trial(X_train.iloc[j, :].to_dict())
ax_client.complete_trial(
trial_index=trial_index, raw_data=Y_train.iloc[j].to_dict()
)
print("fitted Training data")
iteration=[]
mae_list = []
trials = 0
len_X_candidates = len(X_candidates)
for i in range(len_X_candidates):
ax_client.fit_model()
model_bridge = ax_client.generation_strategy.model
observation_features = [
ObservationFeatures(x) for x in X_candidates.to_dict(orient="records")
]
acqf_values = model_bridge.evaluate_acquisition_function(
observation_features=observation_features)
max_value = max(acqf_values)
max_index = acqf_values.index(max_value)
next_experiment = X_candidates.iloc[max_index]
Y_candidate_value = Y_candidates.iloc[max_index]
# Add the features with the max value to X_train and remove from X_candidates
X_train = pd.concat([X_train, next_experiment.to_frame().T], ignore_index=True)
X_candidates = X_candidates.drop(X_candidates.index[max_index], axis=0)
# Attach and complete the next trial
_, trial_index = ax_client.attach_trial(next_experiment.to_dict())
ax_client.complete_trial(
trial_index=trial_index, raw_data=Y_candidate_value.to_dict()
)
trials = trials + 1
model = ax_client.generation_strategy.model
y_predicted = model.predict(test_set_dict)
predicted_y_values = y_predicted[0]['ppm']
mae = mean_absolute_error(Y_test, predicted_y_values)
i = i + 1
iteration.append(i)
mae_list.append(mae)
Thank you for your time!
@ramseyissa, maybe the following will be of help?
Note: copied from https://github.com/facebook/Ax/issues/1312
Here is a self-contained example using vanilla single-objective optimization based on https://github.com/facebook/Ax/issues/460#issuecomment-975233349:
from typing import Any, Dict, Optional
import torch
# from ax.core.objective import ScalarizedObjective
from ax.modelbridge import get_sobol
from ax.modelbridge.generation_strategy import GenerationStep, GenerationStrategy
from ax.modelbridge.registry import Models
from ax.models.torch.botorch_modular.surrogate import Surrogate
from ax.service.ax_client import AxClient
from botorch.acquisition.active_learning import (
MCSampler,
qNegIntegratedPosteriorVariance,
)
from botorch.acquisition.input_constructors import (
MaybeDict,
acqf_input_constructor,
construct_inputs_mc_base,
)
from botorch.acquisition.objective import AcquisitionObjective
from botorch.models.gp_regression import SingleTaskGP
from botorch.models.model import Model
from botorch.utils.datasets import SupervisedDataset
from torch import Tensor
@acqf_input_constructor(qNegIntegratedPosteriorVariance)
def construct_inputs_qNIPV(
model: Model,
mc_points: Tensor,
training_data: MaybeDict[SupervisedDataset],
objective: Optional[AcquisitionObjective] = None,
X_pending: Optional[Tensor] = None,
sampler: Optional[MCSampler] = None,
**kwargs: Any,
) -> Dict[str, Any]:
if model.num_outputs == 1:
objective = None
base_inputs = construct_inputs_mc_base(
model=model,
training_data=training_data,
sampler=sampler,
X_pending=X_pending,
objective=objective,
)
return {**base_inputs, "mc_points": mc_points}
def objective_function(x):
f = x["x1"] ** 2 + x["x2"] ** 2 + x["x3"] ** 2
return {"f": (f, None)}
parameters = [
{"name": "x1", "type": "range", "bounds": [0.0, 1.0], "value_type": "float"},
{"name": "x2", "type": "range", "bounds": [0.0, 1.0], "value_type": "float"},
{"name": "x3", "type": "range", "bounds": [0.0, 1.0], "value_type": "float"},
]
ax_client_tmp = AxClient()
ax_client_tmp.create_experiment(parameters=parameters)
sobol = get_sobol(ax_client_tmp.experiment.search_space)
mc_points = sobol.gen(1024).param_df.values
mcp = torch.tensor(mc_points)
model_kwargs_val = {
"surrogate": Surrogate(SingleTaskGP),
"botorch_acqf_class": qNegIntegratedPosteriorVariance,
"acquisition_options": {"mc_points": mcp},
}
gs = GenerationStrategy(
steps=[
GenerationStep(model=Models.SOBOL, num_trials=num_sobol),
GenerationStep(
model=Models.BOTORCH_MODULAR, num_trials=num_qnipv, model_kwargs=model_kwargs_val
),
]
)
ax_client = AxClient(generation_strategy=gs)
ax_client.create_experiment(
name="active_learning_experiment",
parameters=parameters,
objective_name="f",
minimize=True,
)
for _ in range(20):
trial_params, trial_index = ax_client.get_next_trial()
data = objective_function(trial_params)
ax_client.complete_trial(trial_index=trial_index, raw_data=data["f"])
@sgbaird, thank you very much, this was very helpful! I managed to tweak this a bit for my use case, and swapped out the qNIPV acquisition function for qMES using its own input constructor. I have a few question that you might be able to address here and if @Balandat has some input on this that would also be appreciated.
Ok so the question is, when calling evaluate_acquisition_function(observation_features=observation_features), observation features here is a dictionary of my candidate list that i am looking to pick the next sample from, do i need to call optimize_acqf function at some point before using evaluate_acquisition_function or is this automatically handled here?
Thank you both for your time!
This is the input constructor used for qMES for reference:
@acqf_input_constructor(qMaxValueEntropy)
def construct_inputs_qMES(
model: Model,
training_data: MaybeDict[SupervisedDataset],
bounds: List[Tuple[float, float]],
objective: Optional[MCAcquisitionObjective] = None,
# posterior_transform: Optional[PosteriorTransform] = None,
candidate_size: int = 1000,
**kwargs: Any,
) -> Dict[str, Any]:
r"""Construct kwargs for `qMaxValueEntropy` constructor."""
inputs_mc = construct_inputs_mc_base(
model=model,
training_data=training_data,
objective=objective,
**kwargs,
)
X = _get_dataset_field(training_data, "X", first_only=True)
_kw = {"device": X.device, "dtype": X.dtype}
_rvs = torch.rand(candidate_size, len(bounds), **_kw)
_bounds = torch.tensor(bounds, **_kw).transpose(0, 1)
return {
**inputs_mc,
"candidate_set": _bounds[0] + (_bounds[1] - _bounds[0]) * _rvs,
"maximize": kwargs.get("maximize", True),
}
I can't really comment on the accuracy of the construct_inputs_qMES function (other than, if it doesn't throw errors, that's a good sign 😉) . I don't think you need to deal with optimize_acqf if you're using evaluate_acquisition_function.
Indeed, you're evaluating the acquisition function on a discrete set here using evaluate_acquisition_function, so you can just pick the best and not use optimize_acqf.
Recapping some internal discussion, @ramseyissa is predicting peak positions for an NMR spectra, so a single set of inputs typically has multiple outputs. Depending on the number of new atoms added to each simulation, there could be anywhere from 1 to 16 peaks. I think the possible locations of the peak are on the order of 100's, if not continuous, and 16 outputs might be too much in terms of modeling output behavior.
One option would be to train a VAE to move a spectrum object back and forth from a low-dimensional representation (e.g., ~4 values) and a high-dimensional version (16 values). The ppm values could be sorted to reduce degeneracies or data augmentation could be applied (swapping values).
https://github.com/facebook/Ax/issues/1312#issue-1488742013
Then, I think ScalarizedPosteriorTransform gets used to compute a sort of joint acquisition function. Maybe this can be specified in the part of the generation strategy that gets passed more directly to BoTorch)
EDIT: @ramseyissa, also, I'm not sure if Max-value Entropy Search (MES) is the right model for your application since it still seems to be focused on minimizing or maximizing some output property that you're modeling rather than minimizing the global uncertainty, irrespective of whether the property is low or high. From https://botorch.org/tutorials/max_value_entropy:
Max-value entropy search (MES) acquisition function quantifies the information gain about the maximum of a black-box function
emphasis added
@Balandat @sgbaird :
I have been trying to use qNIPV in the same way i used MES, but i keep getting an error that isn't making much sense to me. In my problem just to remind you, i am not looking to optimize my objective value, i am just looking to minimize my global uncertainty for a predefined list of candidates.
It seems that when using the construct_inputs_qNIPV it is not recognizing mc_points that is leading to the error (the error is seen at the bottom). Any suggestions on how to address this?
@acqf_input_constructor(qNegIntegratedPosteriorVariance)
def construct_inputs_qNIPV(
model: Model,
mc_points: Tensor,
training_data: MaybeDict[SupervisedDataset],
objective: Optional[AcquisitionObjective] = None,
X_pending: Optional[Tensor] = None,
sampler: Optional[MCSampler] = None,
**kwargs: Any,
) -> Dict[str, Any]:
if model.num_outputs == 1:
objective = None
base_inputs = construct_inputs_mc_base(
model=model,
training_data=training_data,
sampler=sampler,
X_pending=X_pending,
objective=objective,
)
return {**base_inputs, "mc_points": mc_points}
# the objective function here is just predicting on a test set and calculating the mae
# with the model that is updated through an active learning loop
def objective_function(model, test_set_dict, test_set_target_vales):
# get gp
model = ax_client.generation_strategy.model
# predict on test set
y_predicted = model.predict(test_set_dict)
# get predicted values
predicted_y_values = y_predicted[0]['ppm']
# calculate mae
mae = mean_absolute_error(test_set_target_vals, predicted_y_values)
print(f'MAE: {mae}')
return {"mae": (mae, None)}
init_len_of_x_train = len(train_dfs)
parameters = [
{"name": vals, "type": "range", "bounds": [-15.0, 15.0], "value_type": "float"}
for vals in descriptors_df.columns]
# this is used to get the mc_points for the acquisition function
ax_client_tmp = AxClient()
ax_client_tmp.create_experiment(parameters=parameters)
sobol = get_sobol(ax_client_tmp.experiment.search_space)
mc_points = sobol.gen(1024).param_df.values
mcp = torch.tensor(mc_points)
# dict of kwargs for BoTorch modular in generation strategy
model_kwargs_val = {
"surrogate": Surrogate(SingleTaskGP),
"botorch_acqf_class": qNegIntegratedPosteriorVariance,
"acquisition_options": {"mc_points": mcp},
}
gs = GenerationStrategy(
steps=[
# GenerationStep(model=Models.SOBOL, num_trials=5),
GenerationStep(
model=Models.BOTORCH_MODULAR, num_trials=-1
, model_kwargs=model_kwargs_val
),
]
)
ax_client = AxClient(generation_strategy=gs)
ax_client.create_experiment(
name="active_learning_experiment",
parameters=parameters,
objective_name="ppm",
# minimize=True,
)
# test set
test_set_dict = [ObservationFeatures(x) for df in test_dfs for x in df.to_dict(orient="records")]
#get test set target values
test_set_target_vals = get_target_values(test_dfs, ppm_df, 'ppm')
#get train set target values
train_set_target_vals = get_target_values(train_dfs, ppm_df, 'ppm')
# my training data is a list of dataframes, so I need to attach each dataframe to ax_client
for df in train_dfs:
for idx in range(len(df)):
_, trial_index = ax_client.attach_trial(df.iloc[idx, :].to_dict())
ax_client.complete_trial(
trial_index=trial_index, raw_data=train_set_target_vals.iloc[idx].to_dict()
)
index_of_candidate_set = []
iteration=[]
mae_list = []
trials = 0
len_X_candidates = len(x_candidates)
for i in range(len_X_candidates):
print("size of X_candidates",len(x_candidates))
print("size of X_train",len(train_dfs))
# fit gp
ax_client.fit_model()
# get gp
model_bridge = ax_client.generation_strategy.model
acqf_values = sum_acqf_values(x_candidates, model_bridge)
#get max of acqf values
max_value = max(acqf_values)
#get index of max value
max_index = acqf_values.index(max_value)
# locate features in orginal candidate set
next_experiment = x_candidates[max_index]
next_exp_target_vals = get_target_values(next_experiment, ppm_df, 'ppm')
# add the best candidate selected by acquisition fx experiment to training set
train_dfs.append(next_experiment)
#drop the best candidate selected by acquisition fx experiment from candidate set
x_candidates.pop(max_index)
# attach this new trail to ax_client, and complete it with the target value "ppm"
for idx in range(len(next_experiment)):
_, trial_index = ax_client.attach_trial(next_experiment.iloc[idx, :].to_dict())
ax_client.complete_trial(
trial_index=trial_index, raw_data=next_exp_target_vals.iloc[idx].to_dict())
trials = trials + 1
model = ax_client.generation_strategy.model
mae_vals = objective_function(model, test_set_dict, test_set_target_vals)
i = i + 1
iteration.append(i)
print(f"Iteration {i+1}: MAE = {mae_vals:.4f}")
mae_list.append(mae_vals)
index_of_candidate_set.append(max_index)
exp_df = exp_to_df(ax_client.experiment)
num_of_zeros = len(train_dfs)
exp_df['MAE'] = [0] * init_len_of_x_train + mae_list
# exp_df
The resulting error from the above script is:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[31], line 193
190 # get gp
191 model_bridge = ax_client.generation_strategy.model
--> 193 acqf_values = sum_acqf_values(x_candidates, model_bridge)
194 # print("len of acqf_values",len(acqf_values))
195
196 # turn list of candidates into dict of observation features
(...)
205
206 # #get max of acqf values
207 max_value = max(acqf_values)
Cell In[16], line 5, in sum_acqf_values(list_of_dfs, model_bridge)
3 for df in list_of_dfs:
4 observation_features = [ObservationFeatures(x) for x in df.to_dict(orient="records")]
----> 5 acqf_values = model_bridge.evaluate_acquisition_function(observation_features=observation_features)
6 # print('acqf values', acqf_values)
7 sum_acqf_values_list.append(sum(acqf_values))
File [/opt/miniconda3/envs/nmrpeaks/lib/python3.10/site-packages/ax/modelbridge/torch.py:434](https://file+.vscode-resource.vscode-cdn.net/opt/miniconda3/envs/nmrpeaks/lib/python3.10/site-packages/ax/modelbridge/torch.py:434), in TorchModelBridge.evaluate_acquisition_function(self, observation_features, search_space, optimization_config, pending_observations, fixed_features, acq_options)
425 obs_feats[i] = t.transform_observation_features(batch)
427 base_gen_args = self._get_transformed_gen_args(
428 search_space=search_space,
...
305 )
306 self.acqf = botorch_acqf_class(**acqf_inputs) # pyre-ignore [45]
307 self.X_pending: Optional[Tensor] = unique_Xs_pending
TypeError: construct_inputs_qNIPV() missing 1 required positional argument: 'mc_points'
@ramseyissa, can you provide a reproducer? i.e., something that can be copy-pasted and run standalone (preferably a link to a Google Colab notebook 😉)
I have a MWE example for qNIPV, though without anything special. Just 3 float inputs and 1 float output. If you can use this script as a template and gradually adapt it to your use-case, you should be able to pinpoint the discrepancy (and possibly resolve the error you were running into).
EDIT: this needs to be updated to account for transformations per Max's comment https://github.com/facebook/Ax/issues/1557#issuecomment-1624582689
@sgbaird,
I went through the MWE example for qNIPV that was referenced above and it was very helpful. I believe it is the same working example you referenced in https://github.com/facebook/Ax/issues/1557#issuecomment-1528840459 earlier on.
Ok so I think I found the issue. For some reason when you call model_bridge.evaluate_acquisition_function you must use the acq_options argument when using the qNIPV acquisition function, even if specifying acquisition_options in model_kwargs within GenerationStrategy, which differs from when i used the MES acquisition function (which also required MC points).
So I think all finally looks good, but maybe a final check from @Balandat would seal the deal (I can provide the reproducer @sgbaird was after :wink:)
Ok so I think I found the issue. For some reason when you call model_bridge.evaluate_acquisition_function you must use the acq_options argument when using the qNIPV acquisition function, even if specifying acquisition_options in model_kwargs within GenerationStrategy, which differs from when i used the MES acquisition function (which also required MC points).
Hmm this is interesting. I didn't go into the weeds on this but I believe this might be due to the fact that qMaxValueEntropy has an input constructor that generating a set of candidate_size candidate_points. The difference to the acqf constructor for qNIPV used here is that candidate_size has a default value, so that gets used if no candidate_size is given in the acquisition_options : https://github.com/pytorch/botorch/blob/6f9818756b6a3f83ad52aa3a6cd7b2591bd3ce0e/botorch/acquisition/input_constructors.py#L865
Gotcha alert: One thing to note here is that the BoTorch model when wrapped in Ax operates in the transformed space (typically the unit cube if all parameters are continuous parameters). So mc_points here will need to be passed in that transformed space rather than the original space over which the parameters are defined. Otherwise you'll get some very funky results (e.g. say your parameters are specified on [1000, 2000]^d, and you choose your mc_points uniformly over the same set, then in the transformed space the model will operate on [0, 1]^d and so most if not all of the mc_points will not actually lie in what corresponds to the search space - in this case "very funky" would mean total garbage). Unfortunately it's at least at this point not straightforward to apply the Ax transforms automatically to the mc_points. Hopefully we can come up with a better solution for this in the future. cc @saitcakmak, @dme65
@Balandat would it be easier to scale to [0, 1]^d outside of Ax using sklearn's MinMaxScaler? Or, assuming UnitX is the only transform at play, would you recommend using UnitX.transform_observation_features?
@Balandat would it be easier to scale to
[0, 1]^doutside of Ax using sklearn's MinMaxScaler? Or, assumingUnitXis the only transform at play, would you recommend using UnitX.transform_observation_features?
Either way should work. But yeah, scaling the data before passing it to Ax is the right thing to do here if you want to pass in the mc points (and assuming you have only continuous parameters).
@Balandat does this look OK?
# MC Points
# WARNING: assumes only UnitX transform, https://ax.dev/docs/models.html#transforms
num_mc_sobol = 2**13
sobol = get_sobol(ax_client_tmp.experiment.search_space)
# mc_points = sobol.gen(1024).param_df.values
obs_features = [
ObservationFeatures(parameters)
for parameters in sobol.gen(num_mc_sobol).param_df.to_dict("records")
]
ux = UnitX(ax_client_tmp.experiment.search_space)
obs_features_ux = ux.transform_observation_features(obs_features)
mc_points = [list(obs.parameters.values()) for obs in obs_features_ux]
mcp = torch.tensor(mc_points)
mcp
Hmm yeah that seems like it should work! I guess since UnitX transforms to the unit cube it should be enough to just generate the points directly without Ax involvement:
mcp = torch.quasirandom.SobolEngine(num_params, scramble=True).draw_base2(13)
Again this assumes that there aren't any more complex things going on such as choice parameters or the like..
@Balandat, thanks for the insight.
Me and @sgbaird have had some internal discussion regarding some of the items discussed above. The result of this are some questions i need clarification on.
- In the reply below, when you mentioned "scaling the data before passing it to Ax is the right thing to do here.." did you also mean that the data that we are working with should be scaled (not just the mc points) but the training, x_candidates, and test set? I mean this as in a preprocessing step before beginning an Ax optimization strategy. Also, just a side note here, from what i understand Ax typically scales the data that is passed to it internally?
@Balandat would it be easier to scale to
[0, 1]^doutside of Ax using sklearn's MinMaxScaler? Or, assumingUnitXis the only transform at play, would you recommend using UnitX.transform_observation_features?
Either way should work. But yeah, scaling the data before passing it to Ax is the right thing to do here if you want to pass in the mc points (and assuming you have only continuous parameters).
- Another clarification, when looking at a list of points that have been evaluated using
evaluate_acquisition_function(when using the qNIPV acquisition function), should i be selecting the most negative point in the list, so the min value since it would be the most negative?
- did you also mean that the data that we are working with should be scaled (not just the mc points) but the training, x_candidates, and test set?
Pretty sure the search space (i.e., parameters and related kwargs) can be left untouched as long as we scale mcp appropriately (i.e., the only change that needs to occur is in transforming mcp).
2. should i be selecting the most negative point in the list
My Intuition is to always take the maximum of the acquisition function, since acq fns are used to "... evaluate the usefulness of one of more design points ...". I'll defer to @Balandat though to clarify