[python-package] Fitting on Polars Dataframe fails due to missing setter for fetures_names_in_
Description
When trying to fit a lightgbm model using a polars Dataframe as input, the code fails with the attribute error:
AttributeError: property 'feature_names_in_' of 'LGBMRegressor' object has no setter
Inheriting from the Model and defining a feature_names_in property with a setter fixes the issue.
This error does not occur when using a pandas DataFrame as Input (Version 2.2.2).
Reproducible example
import lightgbm as lgb
import numpy as np
import polars as pl
n = 500
rng = np.random.default_rng(42)
data = {"x1": rng.integers(0, 2, size=n), "x2": rng.integers(0, 2, size=n)}
df = pl.DataFrame(data)
y = data["x1"] + data["x2"] + data["x1"] * data["x2"]
y = y + rng.normal(scale=0.01, size=n)
parameters = {
"learning_rate": 0.1,
"min_data_in_bin": 1,
"min_data_in_leaf": 1,
"num_iterations": 3,
"num_leaves": 4,
"verbosity": -1,
}
# This fails with an AttributeError
regressor = lgb.LGBMRegressor(**parameters)
regressor.fit(df, y).predict(df)
# Rerunning with the PatchedRegressor fixes the issue
class PatchedRegressor(lgb.LGBMRegressor):
@property
def feature_names_in_(self):
return self._feature_name
@feature_names_in_.setter
def feature_names_in_(self, x):
self._feature_name = x
regressor = PatchedRegressor(**parameters)
regressor.fit(df, y).predict(df)
Environment info
Lightgbm: 4.6.0 Polars: 1.22.0 Numpy: 2.1.3 Python: 3.11.11
Additional Comments
I'll be investigating this week 😅
Hi, I've just run into this issue as well. Is there any update on the progress of this?
I'd be happy to help resolving it if needed 😄
Sorry, I obviously did not get to this yet 😅
I'd be happy to help resolving it if needed
Thanks for the offer @ErikBavenstrand! Given that LightGBM does not officially polars currently (I'm surprised that passing polars data frames to LightGBM "just worked" in a past release), I am working on adding proper support (which will be tracked in https://github.com/microsoft/LightGBM/issues/6204).
Thank you!
I did some further experimentation on version 4.5.0 and found that passing polars DataFrames directly were significantly slower than passing the underlying arrow DataFrames to LightGBM during .fit. I did not investigate further as to why this is happening, but I suspect it has to do with the adding of validation datasets and their conversion from polars to the internal representation. 😄
I don't know what's the issue in your case, but i've had this issue with my model wrapped in sklearn's MultiOutputRegressor. I downgraded sklearn from 1.6.1 to 1.5.2 and it worked.
polars==1.30.0 scikit-learn==1.5.2 lightgbm==4.5.0 (with 4.6.0 works as well)
So from my understanding converting the polars dataframe to the underlying pyarrow table (which should be zero-copy for the majority) should work, right? Because in my case I also get this error then doing this (lightgbm 4.6.0, scikit-learn 1.6.0). However, downgrading one of the two solves that though. 😄