auto-sklearn
auto-sklearn copied to clipboard
Avoid costly re-building of pipelines
Currently, SMAC suggests hyperparameter configurations which are independent of the dataset size. For example, the hyperparameter classifier:max_features which is specified between zero and one is transformed according to max_features = int(n_features ** classifier:max_features). Assuming the dataset in question has only 10 features, SMAC does not know that most values of the tuned hyperparameter map to the same hyperparameter applied to the actual model. Therefore, one needs to track the 'actual' hyperparameters after transformation and check whether they are re-used, and return a cached function value to SMAC if done so.
Initial experiments suggest that 1-2% of the overall runs are actually re-optimizations.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs for the next 7 days. Thank you for your contributions.
I never thought about this the other way around, normally I thought of float values as a way around unknown bounds, but you're right, may of these float parameters map to the same discrete value. i.e. a max_features in [0, 1] is more feasible than being discrete in [0, N] where N is unknown and heuristic.
Same values are also localized when considering floats, for example in you example with n_features=10, SMAC will re-evalute the same configurations from [0.1, 2.0). As the real value of N grows, this problem is less and less and requires no heuristic choice of N.
In the other case with discrete values for the HP, we would end up with only the values [0, 10] being meangingful while [11, N] would all be re-evaluations. This problem gets slowly better as the real N grows but requires a heuristic choice of N before hand that needs to not be much greater than the real one.
It seems for hyperparameters where we can reasonably estimate an upper bound of N and be fairly certain that we will always be close to this upper bound, we should do so, however I do not thin coming up with this rule is worth it and maybe switching most things to float based.