LightGBM
LightGBM copied to clipboard
Pythonw.exe has Stopped Working when setting 'monotone_constraints'.
Description
2022/4/26 With Windows10, I encountered this error when I tried to execute lightgbm.train() with setting monotone_constraints params.
{ 'objective': 'regression',
'verbose': -1,
'monotone_constraints': [1] * 15 + [0] * 23 + [-1] * 4 }
The error is described in the event viewer as follows.
Problem signature
P1: python.exe
P2: 3.9.5150.1013
P3: 60903347
P4: ucrtbase.dll
P5: 10.0.19041.789
P6: 2bd748bf
P7: 000000000007286e
P8: c0000409
P9: 0000000000000007
P10:
2022/5/9 I found that using 'monotone_constraints' with Pandas 'category' type causes the error.
Reproducible example
2022/4/26 I cannot share the code as it is because I am dealing with confidential business data. I'm trying to create reproducible example code.
2022/5/9 I created the following reproducible sample code.
# import library
import numpy as np
import pandas as pd
import altair as alt
import lightgbm as lgb
# create example dataset
size = 100
df = pd.DataFrame(
{
"x": np.linspace(0, 10, size),
"y": np.linspace(0, 10, size)**2 + 10 - (20 * np.random.random(size))
} | {f"x{i}": np.random.random(size) if i < 40 else np.random.randint(0, 10, size) for i in range(0, 50)}
)
# When set_category is False, learning LightGBM model works.
# But when you set True to set_category, the training will fail.
set_category = False
if set_category:
df[[f'x{i}' for i in range(40, 50)]] = df[[f'x{i}' for i in range(40, 50)]].astype('category')
df.head(5)
print(df.dtypes)
# train LightGBM model with monotone constraints
lgb_train = lgb.Dataset(df.drop('y', axis=1), df["y"])
params = {
'objective': 'mse',
'verbose': -1,
'num_threads':8,
'min_child_samples': 5,
'monotone_constraints': [1] + [0] * 40 + [0] * 10,
}
monotone_model = lgb.train(
params,
lgb_train,
num_boost_round=100,
)
# plot the x-dependence of the model outputs
df_tmp = df.copy()
df_tmp[[f"x{i}" for i in range(0, 40)]] = 0.5
df_tmp[[f"x{i}" for i in range(40, 50)]] = 0
monotone_output = pd.DataFrame(
{
"x": df_tmp["x"],
"y": df_tmp["y"],
"y_pred": monotone_model.predict(df_tmp.drop('y', axis=1))
}
)
alt.Chart(monotone_output).mark_point().encode(
x="x",
y="y"
) + alt.Chart(monotone_output).mark_line().encode(
x="x",
y="y_pred",
color=alt.value("red")
)
Environment info
LightGBM version or commit hash: 3.3.2
Command(s) you used to install LightGBM
pip install lightgbm
Runtime environment is x64 Windows10Pro 20H2 Microsoft Visual C++ 2015-2022 Redistributable(x64)-14.31.31103
Additional Comments
The VC++ module was being called, so I re-installed it from the MS site and tried again, but the same error happened. While executing lightgbm.train(), I checked task manager and it does not appear to be running out of memory.
I have added the sample code and updated the comments. Setting 'set_category' to True in the sample code causes the error. I found that using 'monotone_constraints' with Pandas 'category' type causes the error.