machinelearning
machinelearning copied to clipboard
Default parameters not taking into account in GenerateCandidateConfigurations
Per my understanding:
- Default values for trainers are set based on experience from a wide range of dataset. Therefore, they should be a good starting point.
- AutoML start by running with default values and then randomly 10 experiments around it.
- The random values can be used to estimate direction for next parameters search.
However, it seems GenerateCandidateConfigurations does not receive parameters for the 1st run with default parameters in previousRun attribute (SmacSweeper.cs). It likely means forestPredictor and GredyPlusRandomSearch do not know what are the defaults and what was the metrics for the first run, so it can only rely on random experiments. This may have detrimental effect on the next suggested parameters because there have been no known good combinations of parameters in previousRuns.
I believe the default parameters should be added to history and then previousRuns variable, so that SmacSweeper can use them to propose new parameters.
If my understanding is correct it may also affect ModelBuilder, although it probably does not use SmacSweeper.
Background: I have set default for LightGBM based on best defaults for this dataset. I'd like to use AutoML to see user-specific data would benefit from small adjustments. However, since the first run is near optimum, I am worried it will take much longer to find better parameters if it needs to rely only on random search to map the search range. It would be better to start from "last optimum", then random points around it, finally try to find new optimum.
Why I believe the params are not available: History has params as NULL for the 1st run, so it will be filtered away in viableRuns.