FRAMEWORK SETUP UPDATE: set up the framework flaml's special requirements for automlbenchmark
- FLAML reorganized its default requirements. Some of the dependencies are no longer installed by default but are still needed to run experiments in automlbenchmark. We moved those ones into [benchmark]
Thanks for the update! I have some additional questions to figure out how to best integrate this change into the automl benchmark. I noticed this installs the following extra dependencies:
catboost>=0.26
psutil==5.8.0
xgboost==1.3.3
-
Are these then automatically used when installed?
-
Currently some other frameworks also have specific benchmark modes (e.g.
autogluon), but these are set through hyperparameters of the AutoML framework which means you can run bothbenchmarkand non-benchmarkmodes from within this benchmark tool. If the extra packages are automatically used, would there still be a way to turn that off through hyperparameters? -
I noticed
psutilin thebenchmarkrequirements. Is the default installation incapable of following benchmark constraints now (e.g. duration or cores)?
Thanks for the update! I have some additional questions to figure out how to best integrate this change into the automl benchmark. I noticed this installs the following extra dependencies:
catboost>=0.26 psutil==5.8.0 xgboost==1.3.3
- Are these then automatically used when installed?
- Currently some other frameworks also have specific benchmark modes (e.g.
autogluon), but these are set through hyperparameters of the AutoML framework which means you can run bothbenchmarkand non-benchmarkmodes from within this benchmark tool. If the extra packages are automatically used, would there still be a way to turn that off through hyperparameters?- I noticed
psutilin thebenchmarkrequirements. Is the default installation incapable of following benchmark constraints now (e.g. duration or cores)?
Hi @PGijsbers, Thank you for the questions. Please find my answers below and let me know if you have further questions or suggestions.
- The three packages will be automatically used when installed (with the [benchmark] option).
- The use of
catboostandxgboostcan be turned off through ourestimator_listparameter. The use ofpsutilonce installed cannot be turned off through a hyperparameter for now. But we can add one if necessary. - The default installation is capable of the benchmark's constraints. 'psutil' is added for memory control.
(F.Y.I. @sonichi)
In principle, we expect modes to be configurable for the end-user not just through what is installed and what isn't but also through one hyperparameter that expresses the user intent. For example, autogluon has "best_quality" and "fast_inference" modes, and mljar has "compete" and "interpretability" modes, to name a few. Inferring the mode through the installed dependencies means users (and AMLB) can not use the same installed environment without "expert knowledge" of exactly what the difference between the modes is (i.e., the user would now need to know catboost and xgboost are not part of the default behavior, and should be turned off through estimator_list). Would you consider a hyperparameter which configures flaml based on user intent (i.e., to distinguish at least "normal use" and benchmark use)?
In principle, we expect modes to be configurable for the end-user not just through what is installed and what isn't but also through one hyperparameter that expresses the user intent. For example, autogluon has "best_quality" and "fast_inference" modes, and mljar has "compete" and "interpretability" modes, to name a few. Inferring the mode through the installed dependencies means users (and AMLB) can not use the same installed environment without "expert knowledge" of exactly what the difference between the modes is (i.e., the user would now need to know
catboostandxgboostare not part of the default behavior, and should be turned off throughestimator_list). Would you consider a hyperparameter which configures flaml based on user intent (i.e., to distinguish at least "normal use" and benchmark use)?
Hi @PGijsbers, thank you for your suggestion. I understand that adding a mode to configure the expected behavior is a good practice and we will add the modes in that way when we do have different choices of modes in the future. But this this benchmark version is needed because of the following motivation: we want to minimize the installed dependencies by the default flaml (we received requests of reducing mandatory dependencies). In other words, the installation of dependencies is exactly what we want to control via the default and [benchmark] versions.
Thank you!
I understand the concern for bloating the default installation with dependencies that aren't necessary. We can use this for now and clearly denote that it is using a benchmark configuration. That said, have you considered that it's also possible to add the hyperparameter and when set to "benchmark" check whether or not the flaml[benchmark] dependencies are installed? That could allow users of flaml[benchmark] to still use the non-benchmark configuration without requiring a second environment, without dependency bloat for flaml (default) users.
I understand the concern for bloating the default installation with dependencies that aren't necessary. We can use this for now and clearly denote that it is using a
benchmarkconfiguration. That said, have you considered that it's also possible to add the hyperparameter and when set to "benchmark" check whether or not theflaml[benchmark]dependencies are installed? That could allow users offlaml[benchmark]to still use the non-benchmark configuration without requiring a second environment, without dependency bloat forflaml(default) users.
This is a good suggestion! I will add such a configurable option (probably within a week). Will need to update this PR once it is added. Thank you!
Any updates? If not I will probably simply raise an error when a user wants to run regular flaml when the benchmark dependencies are already installed (the error will tell them to re-install flaml without benchmark dependencies, or manually remove those dependencies if they want to run flaml in regular mode).
Added benchmark support in #528, but looking forward to an updated integration script that can use the dependencies as intended for the new release :)
Thanks, @PGijsbers, an issue has been created accordingly.
Is there an ETA on v2.0 (and updated integration script) btw?
Is there an ETA on v2.0 (and updated integration script) btw?
Hi @PGijsbers Thank you for asking. We don't have an ETA for now but will let you know once we have one. The automl functionality is not affected much by the changes in v2.0. So if you plan to do experiments, please feel free to use the older version. Thank you!
Thanks for letting us know, will do!