LightGBM icon indicating copy to clipboard operation
LightGBM copied to clipboard

Lightgbm trains much slower than catboost.

Open fengshansi opened this issue 1 year ago • 15 comments

On Ubuntu 22.04.2 LTS,python version 3.11.4,lightgbm 4.3.0。 Data size is 3000. params

{
        "boosting_type": "gbdt",  
        "objective": "binary",  
        "verbose": -1,
        "n_jobs": -1,
        "device": "cpu",
        "random_state": 1,
        "metric": "None",
        "learning_rate": 0.03,
    }

feval:

def weighted_f1_score(preds, train_data):
    labels = train_data.get_label()
    preds_binary = (preds > 0.5).astype(int)  
    f1 = f1_score(labels, preds_binary, average="weighted")
    return "weighted_f1_score", f1, True

I use several categorical_features.

lightgbm.train(
        params=params,
        train_set=train_dataset,
        num_boost_round=iterations, 
        feval=feval, 
        categorical_feature=current_cat_feature,
        callbacks=[ lightgbm.early_stopping(50, first_metric_only=False),lightgbm.log_evaluation(period=20, show_stdv=True),],
  
    )

I need 5 minutes to train, much slower than catboost.

fengshansi avatar May 16 '24 13:05 fengshansi

Hey @fengshansi, thanks for using LightGBM. Unfortunately, this isn't enough information, we'd also need the following:

  • How many iterations are you running?
  • At which iteration is LightGBM stopping?
  • At which iteration is CatBoost stopping?
  • Which parameters are you using for catboost?
  • How many features do you have?
  • Are you also using your custom metric in catboost?

For 3,000 samples 5 minutes sounds like a lot so I'm guessing your custom metric is being the bottleneck here but it's very hard to tell with just this information.

jmoralez avatar May 16 '24 15:05 jmoralez

Thank you for your help. My answer is:

  • The max iteration is 300. I use early stop of 60. Both of lightgbm and catboost.
  • LightGBM stop at 90. So it runs 150 iterations.
  • Catboost actually, I use optuna for 50 trail to search paramenters. Even run 50 times, it use 1 minute 3seconds. When I run it for one time of 70 iterations without earlystopping. It use 0.2 seconds. All the time get from jupyternotebook.
  • params of 50 times is
"learning_rate": trial.suggest_float("learning_rate", 0.001, 0.1, log=True),
    "depth": trial.suggest_int("depth", 1, 10),
    "subsample": trial.suggest_float("subsample", 0.05, 1.0),
    "colsample_bylevel": trial.suggest_float("colsample_bylevel", 0.05, 1.0),
    "min_data_in_leaf": trial.suggest_int("min_data_in_leaf", 1, 100),
  • 10 category features and 10 numeric features
  • I don't use custom metric in catboost. Catboost offers macro f1.

fengshansi avatar May 16 '24 16:05 fengshansi

How long does it take if you remove your custom metric?

jmoralez avatar May 16 '24 16:05 jmoralez

Thanks for using LightGBM. Could you also provide information about how catboost is used? In my experience, the speed of catboost varies a lot depending on the tree structure you select and the boosting mode. These choices often make trade-offs between speed and performance.

shiyu1994 avatar May 16 '24 16:05 shiyu1994

删除自定义指标需要多长时间?

Also 5 minutes. I use metric= "binary_logloss".

fengshansi avatar May 16 '24 16:05 fengshansi

感谢您使用LightGBM。您能否提供有关如何使用 catboost 的信息?根据我的经验,catboost 的速度会根据您选择的树结构和提升模式而有很大差异。这些选择通常会在速度和性能之间做出权衡。

{ "iterations": 300, "learning_rate": 0.07116892811065063, "depth": 5, "loss_function": "Logloss", "verbose": 20, "eval_metric": "TotalF1:average=Macro", "subsample": 0.2697512982046929, "colsample_bylevel": 0.932255235452595, "early_stopping_rounds": 60, "min_data_in_leaf": 98, } I use a CatBoostClassifier.

fengshansi avatar May 16 '24 16:05 fengshansi

Without data and working code, I fear we are stuck here.

mayer79 avatar May 27 '24 09:05 mayer79

Here is code and data https://github.com/fengshansi/lgbm_compare.

fengshansi avatar May 27 '24 12:05 fengshansi

@fengshansi can you try using the same parameters in both? For example you're setting 0.3 as the learning rate for LightGBM and 0.7 for CatBoost, which should converge faster. Also the default leaves in LightGBM is 31 and you're using a depth of 6 in CatBoost, which produces 64 leaves.

jmoralez avatar May 28 '24 16:05 jmoralez

@fengshansi : On my laptop (8 threads), running your two notebooks gives:

LightGBM

image

CatBoost

image

Thus, LightGBM is 4-5 times faster (using pip install)

mayer79 avatar Jun 01 '24 13:06 mayer79

:在我的笔记本电脑(8 个线程)上,运行您的两个笔记本可以:

光GBM

image

CatBoost 升压

image

因此,LightGBM 的速度要快 4-5 倍(使用 pip install)

Unbelievable, my lightgbm takes nearly 5 minutes to run

fengshansi avatar Jun 01 '24 13:06 fengshansi

Ooops :-). I have reset the notebook kernels before running each of them.

mayer79 avatar Jun 01 '24 13:06 mayer79

哎呀:-)。在运行每个笔记本内核之前,我已经重置了它们。

I reinstalled lightgbm. But still very slow. With Python 3.11.4 and lightgbm 4.3.0.

fengshansi avatar Jun 01 '24 13:06 fengshansi

可能是 python 版本和 lightgbm 版本的问题。我自己使用 python 3.11.9 和 lightgbm 4.3.0 时需要跑半小时以上,但是使用 python 3.10.13 和 lightgbm 4.1.0 只需要几分钟。

JinProton avatar Jun 12 '24 03:06 JinProton

可能是 python 版本和 lightgbm 版本的问题。我自己使用 python 3.11.9 和 lightgbm 4.3.0 时需要跑半小时以上,但是使用 python 3.10.13 和 lightgbm 4.1.0 只需要几分钟。

Thank you very much. I will try it.

fengshansi avatar Jun 13 '24 09:06 fengshansi

This issue has been automatically closed because it has been awaiting a response for too long. When you have time to to work with the maintainers to resolve this issue, please post a new comment and it will be re-opened. If the issue has been locked for editing by the time you return to it, please open a new issue and reference this one. Thank you for taking the time to improve LightGBM!

github-actions[bot] avatar Jul 22 '24 04:07 github-actions[bot]