LightGBM icon indicating copy to clipboard operation
LightGBM copied to clipboard

categorical_feature is not checked when categorical_feature is list of int

Open pangyouzhen opened this issue 1 year ago • 2 comments

Description

the max value of categorical_feature(list of int) can exceed the length of features.

Reproducible example

import lightgbm as lgb
import pandas as pd
import numpy as np

X = pd.DataFrame(
    {
        "fea0": np.random.randint(0,10, 6),
        "fea1": np.random.randint(0,5,6),
        "fea2": np.random.randint(0,2,6)
    }
)

print(X)
y = pd.Series([0,0,0,1,1,1])

train = lgb.Dataset(data=X,label=y,categorical_feature=[0,3])
lgb.Booster(train_set=train)

this can raise error if we set categorical_feature=["fea0","fea3"] Wrong type(str) or unknown name(fea3) in categorical_feature

Environment info

LightGBM version or commit hash:

3.3.5 Command(s) you used to install LightGBM


pip install lightgbm 

Additional Comments

pangyouzhen avatar Apr 03 '23 08:04 pangyouzhen

Hey @pangyouzhen, thanks for raising this! Are you interested in working on a fix? It'd be just adding that check here: https://github.com/microsoft/LightGBM/blob/f74875ed60e696ee7d223ddb409e66f51bddbb47/python-package/lightgbm/basic.py#L1814-L1819 and a test that verifies it works as expected.

jmoralez avatar Apr 19 '23 21:04 jmoralez

take

pangyouzhen avatar Jun 07 '23 09:06 pangyouzhen