FEDOT enh: Post-Improving Boosting Models

enh: Post-Improving Boosting Models

Open aPovidlo opened this issue 6 months ago • 0 comments

Summary

TLDR:

Updating initial assumptions with boosting models;
Adding new evolutionary mutations connected with boosting models;
Allow the use of boosting model data with nan's.
Allow to use boosting with GPU.

Motivation

The motivation for refactoring boosting models was (#1155 #1209 #1264):

Update implementation and split it into a separate class strategy.
Allow boosting models to use categorical features without encoding.
To create a basis for implementing fitting with bagging (#1005) like the one used in other popular AutoML frameworks.

Results of testing on OpenML are available here. During the development, subsequent ideas for improvement arose.

Guide-level explanation

1. Updating initial assumptions with boosting models

Add more pipelines that contain boosting models to the initial assumptions. Updates presets by using boosting models. I want to draw your attention to the fact that boosting models have several strategies that can be used for various pipelines and presets. You can find more information about the strategy in the boosting framework documentation.

2. Evolutionary Mutations for Boosting Models

According to parameter updates, adding new mutations for pipelines with boosting models became possible.

Boosting strategy mutation:

Switch to another strategy method

Using category mutation:

Switch to enable_categorical (default: True)

Change early_stopping_rounds:

Increase or decrease early_stopping_rounds to change the fitting time in the population

Improving metric of XGBoost model mutation:

Decrease max_depth
Increase min_child_weight, gamma, lambda

Improving robustness for the noise of XGBoost model mutation:

Decrease subsample, colsample_bytree, colsample_bylevel, colsample_bynode for some step.
Switch to use the dart strategy method.

Improving metric of LightGBM model mutation:

Decrease learning_rate
Increase max_bin, num_iterations, num_leaves
Switch to use the dart strategy method.

Improving robustness for overfitting of LightGBM model mutation:

Decrease max_bin and num_leaves
Increase or decrease min_data_in_leaf and min_sum_hessian_in_leaf
Use bagging_fraction and bagging_freq
Use feature_fraction
Use regularization methods: lambda_l1, lambda_l2, min_gain_to_split and extra_trees
Decrease max_depth

Improving robustness for overfitting of Catboost model mutation:

Increase or decrease l2_leaf_reg, colsample_bylevel, subsample
Decrease max_depth
Increase or decrease iterations

3. Allow the use of boosting model data with nan's.

One of the advantages of boosting methods is comparing with nan's in data. Implementing this feature in the current version requires refactoring the preprocessing with filling.

4. Allow to use boosting with GPU.

It is possible to use a GPU to accelerate fitting boosting. Therefore, it would be great to add such an opportunity.

Unresolved Questions

Is it possible to continue #1005? The main idea was to develop a method used in other AutoML frameworks. The main problem that had to be faced and not solved is that the pipeline generation approach differs from other frameworks, parallelization of the learning process of basic models on extraped samples, and embedding such an approach into the composition process since it is pretty time-consuming and resource-intensive. However, it guarantees more stable and accurate models. It would be possible to use this approach after composting; for example, if a boosting model is found in the final pipeline, then try to train it using this method.

P.S.

I also note that adding a weighted model, models from a family based on k-nearest neighbors as a meta-model in an ensemble, will help diversify pipelines for classification and regression.

Also note about not perfect method to detect categorical features.

Jul 31 '24 15:07 aPovidlo

FEDOT FEDOT copied to clipboard

enh: Post-Improving Boosting Models

Summary

Motivation

Guide-level explanation

Unresolved Questions

P.S.

FEDOT
FEDOT copied to clipboard