LightGBM
LightGBM copied to clipboard
categorical_feature is not checked when categorical_feature is list of int
Description
the max value of categorical_feature(list of int) can exceed the length of features.
Reproducible example
import lightgbm as lgb
import pandas as pd
import numpy as np
X = pd.DataFrame(
{
"fea0": np.random.randint(0,10, 6),
"fea1": np.random.randint(0,5,6),
"fea2": np.random.randint(0,2,6)
}
)
print(X)
y = pd.Series([0,0,0,1,1,1])
train = lgb.Dataset(data=X,label=y,categorical_feature=[0,3])
lgb.Booster(train_set=train)
this can raise error if we set categorical_feature=["fea0","fea3"]
Wrong type(str) or unknown name(fea3) in categorical_feature
Environment info
LightGBM version or commit hash:
3.3.5 Command(s) you used to install LightGBM
pip install lightgbm
Additional Comments
Hey @pangyouzhen, thanks for raising this! Are you interested in working on a fix? It'd be just adding that check here: https://github.com/microsoft/LightGBM/blob/f74875ed60e696ee7d223ddb409e66f51bddbb47/python-package/lightgbm/basic.py#L1814-L1819 and a test that verifies it works as expected.
take