self-driving-lab-demo icon indicating copy to clipboard operation
self-driving-lab-demo copied to clipboard

Feature Request: surrogate models of the objectives

Open sdaulton opened this issue 3 years ago • 2 comments

Hey @sgbaird!

This repo is super cool! It is great to see Ax is useful for these optimization problems.

In the interest of lightweight R&D, it would be awesome if this repo had multi-fidelity surrogate models of the objective functions. This would make it easier to develop better Bayesian optimization methods (and run multiple replications of optimization loops), without needing the custom hardware.

Would it be possible to add some multi-fidelity surrogate models of the objective functions (e.g. Random Forests) to the repo that could be downloaded and used?

Thanks!

cc @eytan @balandat

sdaulton avatar Nov 02 '22 15:11 sdaulton

Hi @sdaulton.

This repo is super cool! It is great to see Ax is useful for these optimization problems. Thank you!

In the interest of lightweight R&D, it would be awesome if this repo had multi-fidelity surrogate models of the objective functions. This would make it easier to develop better Bayesian optimization methods (and run multiple replications of optimization loops), without needing the custom hardware.

I'm planning to submit a precomputed dataset as part of Olympus https://github.com/aspuru-guzik-group/olympus/issues/17. Agreed about the benefit of using it without the hardware. While I intend to maintain the publicly accessible hardware for a long time, I recognize that something may come up in the future such that it's no longer accessible. The data also changes based on whether the lights in the room are on or not 💡😄.

Would it be possible to add some multi-fidelity surrogate models of the objective functions (e.g. Random Forests) to the repo that could be downloaded and used?

Great suggestion. Lmk what you think about the following in terms of recording a multi-fidelity dataset and adding surrogate models to the repo.

  • Probably just host the data as a CSV file in this repo, and make the data easy to retrieve via the Python API
  • create separate models for each of the eight discrete wavelength objectives and make it easy to compute the scalarized objectives (MAE, RMSE, and Fréchet distance) based on a random color that should be matched
  • For simplicity and compatibility, maybe just sklearn's RandomForestClassifier for the model. Probably not too critical to use something heavier-duty as long as there are enough datapoints

Open to feedback here, and I think the notebook that creates the above dataset and surrogate models will also be helpful for people to look at.

sgbaird avatar Nov 02 '22 20:11 sgbaird

That sounds great! Random Forests make sense (perhaps on the average value from multiple measurements if noise is a concern). Thanks!

sdaulton avatar Nov 03 '22 00:11 sdaulton