botorch icon indicating copy to clipboard operation
botorch copied to clipboard

warnings.warn("Fitting failed on all retries.", RuntimeWarning)

Open MarcAmil30 opened this issue 2 years ago • 8 comments

🐛 Bug

I am new to BoTorch and programming and when implementing a simple Bayesian Optimization Loop I get this error: warnings.warn("Fitting failed on all retries.", RuntimeWarning)

I looked at the source code however I could not discover what was the main issue triggering this error.

To reproduce

** Code snippet to reproduce **

def get_next_points(init_x, init_y, best_init_y, bounds, n_points=1):
    multitask_model = SingleTaskGP(X,Y) 
    mll2 = ExactMarginalLogLikelihood(multitask_model.likelihood, multitask_model)
    fit_gpytorch_model(mll2)
    
    EI = qExpectedImprovement(model=multitask_model, best_f=best_init_y)
    
    candidates, _ = optimize_acqf(
                acq_function = EI,
                bounds=bounds,
                q=24,  #q value tells the optimize function how many candidates to return in our case only 1
                num_restarts=10, 
                raw_samples = 1024)
    return candidates

get_next_points(X, Y, best_init_y2, bounds2, n_points=24)

** Stack trace/error message **

get_next_points(X, Y, best_init_y2, bounds2, n_points=24)

Expected Behavior

24 outputs to try for next experiment without the warning

System information

Please complete the following information:

Additional context

Add any other context about the problem here.

MarcAmil30 avatar May 17 '22 23:05 MarcAmil30

Hi @MarcAmil30. Can you provide a minimally reproducible example with the training data that leads to this warning? It is hard to say what's happening without having the particular X / Y.

saitcakmak avatar May 17 '22 23:05 saitcakmak

X_train shape --> torch.Size([567, 21]) y_train shape --> torch.Size([567, 1])

These are some of the data (X values are the different concentrations combinations I want to try produced by a Box Benhken design whilst the y values are the output growth rate). These are eventually turned into tensor using the function shown below.

X output is : [[3.60e-02 6.00e-03 4.94e-05 4.00e-05 7.90e-05 1.00e-01 6.00e-03 2.86e-03 4.00e-02 1.00e-01 4.00e-02 7.50e-02 1.81e-03 2.00e-02 1.00e-03 3.90e-03 1.00e+00 1.50e+00 2.00e+00 1.00e+00 2.22e-04] [3.60e-02 6.00e-03 4.94e-05 4.00e-05 7.90e-04 1.00e-02 6.00e-03 2.86e-03 4.00e-02 1.00e-01 4.00e-02 7.50e-02 1.81e-03 2.00e-02 1.00e-03 3.90e-04 1.00e+00 1.50e+00 2.00e+00 0.00e+00 2.22e-04] [3.60e-02 6.00e-03 4.94e-05 4.00e-05 7.90e-05 1.00e-02 6.00e-03 2.86e-03 4.00e-02 0.00e+00 4.00e-02 7.50e-02 1.81e-03 2.00e-02 1.00e-03 3.90e-04 1.00e+00 1.50e+00 2.00e+00 0.00e+00 2.22e-04] [3.60e-02 6.00e-03 4.94e-05 4.00e-05 7.90e-05 1.00e-02 6.00e-03 2.86e-03 4.00e-02 1.00e-01 4.00e-02 7.50e-02 1.81e-03 2.00e-02 1.00e-03 3.90e-04 0.00e+00 1.50e+00 2.00e+00 0.00e+00 2.22e-04] [3.60e-02 6.00e-03 4.94e-05 4.00e-05 7.90e-05 1.00e-02 6.00e-03 2.86e-03 4.00e-02 1.00e-01 4.00e-02 7.50e-02 1.81e-03 2.00e-02 1.00e-03 3.90e-04 1.00e+00 1.50e+00 2.00e+00 0.00e+00 0.00e+00] [3.60e-02 6.00e-03 4.94e-05 4.00e-05 7.90e-05 1.00e-02 6.00e-03 2.86e-03 4.00e-02 1.00e-01 4.00e-02 7.50e-02 0.00e+00 2.00e-02 1.00e-03 3.90e-04 1.00e+00 1.50e+00 2.00e+00 0.00e+00 2.22e-04] [3.60e-02 6.00e-03 4.94e-05 4.00e-05 7.90e-05 1.00e-02 6.00e-03 2.86e-03 4.00e-02 1.00e+00 4.00e-02 7.50e-02 1.81e-03 2.00e-02 1.00e-03 3.90e-04 1.00e+00 1.50e+00 2.00e+00 0.00e+00 2.22e-04] [3.60e-02 6.00e-03 4.94e-05 4.00e-05 7.90e-05 1.00e-02 6.00e-03 2.86e-03 4.00e-02 1.00e-01 4.00e-02 7.50e-02 1.81e-03 2.00e-02 1.00e-03 3.90e-04 1.00e+00 1.50e+00 2.00e+00 0.00e+00 2.22e-03]

y output is: 2.68039784e+00 2.22912800e+00 2.20514846e+00 2.16908462e+00 2.15662274e+00 2.11923719e+00 2.05069716e+00 2.03766884e+00 2.00217143e+00 1.98310102e+00 1.95742208e+00 1.94986944e+00 1.94798130e+00 1.94288322e+00 1.92626748e+00 1.90436481e+00 1.89907799e+00 1.89303590e+00 1.89171419e+00 1.88510555e+00 1.86282542e+00 1.85565035e+00 1.84772001e+00 1.82449575e+00 1.82336289e+00 1.82147464e+00 1.79872551e+00 1.79258587e+00 1.78654377e+00 1.78276751e+00 1.78125695e+00 1.77748069e+00 1.76898392e+00 1.75878786e+00 1.74009514e+00 1.73178711e+00 1.73033091e+00 1.72650040e+00 1.71290554e+00 1.70327603e+00 1.69685624e+00 1.69440167e+00 1.69024771e+00 1.68968128e+00]

This is the code

def csvconverter(file_name):
    #read csv file and load row data into variables 
    file_out = pd.read_csv(file_name)
    parameters = ['CaCl2_2H2O','Citric_acid', 'Co(NO3)2_6H2O', 'CoCl2_6H2O', 'CuSO2_5H2O', 'FeCl3_6H2O', 'Ferric_ammonium_citrate', 'H3BO3', 'K2HPO4', 'KCl', 'KH2PO4', 'MgSO4_7H2O', 'MnCl2_4H2O', 'Na2CO3', 'Na2EDTA', 'Na2MoO4_2H2O', 'NaCl', 'NaNO3', 'TES', 'Tris', 'ZnSO4_7H2O', 'Normalised_Mean_gDWL']
    file_out = file_out[parameters] #divided by 1000 to convert g/L to kg/L
    x = file_out.iloc[:, 0:21].values
    y = file_out.iloc[:, 21].values 

    #converting to torch tensors 
    X_train = torch.tensor(x, dtype=torch.float32)
    y_train = torch.tensor(y).unsqueeze(-1)
    best_observed_value = y_train.max().item()

  
    return X_train, y_train, best_observed_value

MarcAmil30 avatar May 18 '22 07:05 MarcAmil30

Additionally, is there a way to visualise or know that the following candidates predicted are correct before trying out the experiment?

As currently, the output I get is just a list of candidates from the code below

def get_next_points(init_x, init_y, best_init_y, bounds, n_points=1):
    multitask_model = SingleTaskGP(X,Y) 
    mll2 = ExactMarginalLogLikelihood(multitask_model.likelihood, multitask_model)
    fit_gpytorch_model(mll2)
    
    EI = qExpectedImprovement(model=multitask_model, best_f=best_init_y)
    
    candidates, _ = optimize_acqf(
                acq_function = EI,
                bounds=bounds,
                q=24,  #q value tells the optimize function how many candidates to return in our case only 1
                num_restarts=10, 
                raw_samples = 1024)
    return candidates

MarcAmil30 avatar May 18 '22 07:05 MarcAmil30

Additionally, is there a way to visualise or know that the following candidates predicted are correct before trying out the experiment?

What do you mean by "correct"? You can do that retrospectively, but prospectively is a lot harder. One thing you could do would be to check the model fits (are the predictions well calibrated?), predict the outcomes, and compare them the training data of the model to see how they compare.

Balandat avatar May 19 '22 04:05 Balandat

Hi, any reason why this might occur? I noticed that this error occurs for me when I use data that has a lot of data points with the same features in certain dimensions. For instance data similar to the following X, Y where the first three dims relate to features of a training example and the last three dims describe parameters for the objective function. In this case, the mobo loop is written with the first three dims fixed for any given evaluation example (the GP is first fit to a training set) and I want to find optimal values for the last three dims given the values of the first three.

I noticed that the data above also has a set of repeated features.

tensor([[242.3015, 175.3548,   0.7095,   0.5006,   9.4673,  16.9167],
        [242.3015, 175.3548,   0.7095,   0.4810,   9.8588,  17.2720],
        [242.3015, 175.3548,   0.7095,   0.4711,  11.3015,  14.1272],
        [242.3015, 175.3548,   0.7095,   0.5300,   9.7603,  14.0485]],
       dtype=torch.float64)

tensor([[-6.8171e-05, -2.9534e-01],
        [-9.2130e-05, -3.1523e-01],
        [-1.1925e-04, -3.0057e-01],
        [-5.8488e-05, -2.6671e-01]], dtype=torch.float64)

123epsilon avatar Aug 11 '22 19:08 123epsilon

Hi @123epsilon. Having lots of repeated or very close observations within the training data can lead to numerical singularity of the covariance matrix, which can lead to the failure of the model fitting or other parts of the BO loop where we try to compute & sample from the posterior of the GP. You can try wrapping the particular part of the code in a with gpytorch.settings.cholesky_max_tries(<value>) (default is 3, larger values will be more tolerant) context to try to force things through (at the expense of biasing the results, i.e., GP predictions).

saitcakmak avatar Aug 12 '22 21:08 saitcakmak

@saitcakmak I see, I noticed this doesn't happen when I do not include the first three features. Are there any other established ways to incorporate features like this in a dataset for a GP?

123epsilon avatar Aug 13 '22 18:08 123epsilon

I haven't worked too much with repeated values in inputs. This is closer to multi-task / multi-fidelity settings, so @qingfeng10 & @danielrjiang might have better suggestions.

saitcakmak avatar Aug 15 '22 21:08 saitcakmak

Closing as answered, but feel free to reopen with further bugs or questions.

esantorella avatar Jan 30 '23 17:01 esantorella