syne-tune
syne-tune copied to clipboard
Refactor surrogates in blackbox repository
Currently, surrogates may return inconsistent metric curves (e.g., elapsed_time not monotonic w.r.t. fidelity). It is also unclear how seed is treated in a surrogate.
Will use multi-variate regression natively supported in scikit-learn. We currently already use that w.r.t. num_objectives. The input of the model will be the HP config only. The old way can still be used, but won't be the default.
Will also sort out the situation with seed.
Could you explain why multivariate regression would solve the monotonicity? This part is not clear to me.
Regarding the seed, this information is not used in the sense that all evaluations are used to estimate the surrogate (which is a point predictor at the moment). I am also not sure what you mean by sorting out the situation with the seed.
Hi David, multivariate regression as built into sklearn (NOT one regressor per output) maps x to vectors y, using some forest over trees. In the leafs of trees, you have a number of y_i's, and it predicts the average of those. The tree is built by splitting the data w.r.t. attributes of x, but using a distance between y vectors (likely squared norm).
If a property holds for all y_i's and is retained under convex combination, it also holds for all predictions. And monotonicity is such a property, positivity as well.
This is also cheaper, because the number of datapoints is not a multiple of fidelities, so subsampling is not needed.
It may, of course, also work less well, this is why I am leaving all the current code in there, so folks can choose.
As for seed: Yes, I see, you merge data across seeds in order to fit the surrogate to all that. I am just putting in a choice to keep seeds separate. But the default will be what it is right now.
Retaining seeds in the surrogate is useful in order to replicate the variations coming in through different seeds (as each trial typically picks a different seed).
The current code already uses multivariate regression w.r.t. num_objectives, so the y in fit is already a matrix with >1 column So this should all work, also with XGBoost.
This is actually pretty elegant code in BlackboxSurrogate
Thanks for the explanation, I guess you meant to use a specific regressor such as a tree method, then I agree you would have some guarantee (it does not hold true if we would use an MLP, this is what I did not understood).
Regarding the seed, I am not sure I got what you mean. Currently, all data points are put together in the supervised dataset so if you have two seeds, you would have two training examples. Do you mean to change the estimation problem so that a map from num_hyperparameter_dim
to num_objectives x num_seeds
is learned?
It seems to me that if we would include seeds in surrogate, then we should have probabilistic models that samples from a distribution when being queried.
No, the alternative is to fit one model per seed, only on the data for that seed. If you have 4 seeds, you get 4 models, each trained on 1/4 of the complete data. But merging the data across seeds will still be the default.
Of-course, makes sense thanks. I also think that merging seeds should be default for efficiency reasons.
You are right, MLP does not have that property.
This is a bit stuck. I discovered that if benchmark_dehb experiments with lcbench are repeated with RandomForestRegressor instead of 1-NN, results are very poor.
If [old], [pc=True], [pc=False] denote old code and new code with predict_curves=X, then:
- 1-NN does the same for all 3 cases, but [pc=True] runs faster
- RandomForestRegressor does the same for [old], [pc=False], but is even worse for [pc=True] (but faster)
TODO: Need to first understand and fix issues with RandomForestRegressor.
One simple thing to try is to map elapsed_time -> time_per_resource before fitting a model, and reverse after prediction. This curve should be easier to fit for methods that rely on targets in order to split up the input space.
OK, I implemented the mapping elapsed_time -> time_per_resource. Results for RandomForestRegressor are better than without that (quite a bit), but results with 1-NN are still quite a bit better.
I leave this for now, but this clearly needs further investigation. It may even be that the task becomes too simple with 1-NN?
Relabel this one, as it is not a bug, but seems to be a general issue with surrogates. I leave this one open, because there are still some things I'd like to do here.
OK, PR #405 is fixing the most obvious problems, while in general, we need to be careful with the accuracy of surrogates