LightGBM icon indicating copy to clipboard operation
LightGBM copied to clipboard

[python-package] Fitting on Polars Dataframe fails due to missing setter for fetures_names_in_

Open FelicitasTengenQC opened this issue 10 months ago • 5 comments

Description

When trying to fit a lightgbm model using a polars Dataframe as input, the code fails with the attribute error: AttributeError: property 'feature_names_in_' of 'LGBMRegressor' object has no setter

Inheriting from the Model and defining a feature_names_in property with a setter fixes the issue.

This error does not occur when using a pandas DataFrame as Input (Version 2.2.2).

Reproducible example

import lightgbm as lgb
import numpy as np
import polars as pl

n = 500
rng = np.random.default_rng(42)

data = {"x1": rng.integers(0, 2, size=n), "x2": rng.integers(0, 2, size=n)}

df = pl.DataFrame(data)

y = data["x1"] + data["x2"] + data["x1"] * data["x2"]
y = y + rng.normal(scale=0.01, size=n)


parameters = {
    "learning_rate": 0.1,
    "min_data_in_bin": 1,
    "min_data_in_leaf": 1,
    "num_iterations": 3,
    "num_leaves": 4,
    "verbosity": -1,
}

# This fails with an AttributeError
regressor = lgb.LGBMRegressor(**parameters)
regressor.fit(df, y).predict(df)


# Rerunning with the PatchedRegressor fixes the issue
class PatchedRegressor(lgb.LGBMRegressor):

    @property
    def feature_names_in_(self):
        return self._feature_name

    @feature_names_in_.setter
    def feature_names_in_(self, x):
        self._feature_name = x


regressor = PatchedRegressor(**parameters)
regressor.fit(df, y).predict(df)
 

Environment info

Lightgbm: 4.6.0 Polars: 1.22.0 Numpy: 2.1.3 Python: 3.11.11

Additional Comments

FelicitasTengenQC avatar Feb 28 '25 15:02 FelicitasTengenQC

I'll be investigating this week 😅

borchero avatar Mar 03 '25 16:03 borchero

Hi, I've just run into this issue as well. Is there any update on the progress of this?

I'd be happy to help resolving it if needed 😄

ErikBavenstrand avatar Apr 08 '25 08:04 ErikBavenstrand

Sorry, I obviously did not get to this yet 😅

I'd be happy to help resolving it if needed

Thanks for the offer @ErikBavenstrand! Given that LightGBM does not officially polars currently (I'm surprised that passing polars data frames to LightGBM "just worked" in a past release), I am working on adding proper support (which will be tracked in https://github.com/microsoft/LightGBM/issues/6204).

borchero avatar Apr 27 '25 23:04 borchero

Thank you!

I did some further experimentation on version 4.5.0 and found that passing polars DataFrames directly were significantly slower than passing the underlying arrow DataFrames to LightGBM during .fit. I did not investigate further as to why this is happening, but I suspect it has to do with the adding of validation datasets and their conversion from polars to the internal representation. 😄

ErikBavenstrand avatar Apr 28 '25 07:04 ErikBavenstrand

I don't know what's the issue in your case, but i've had this issue with my model wrapped in sklearn's MultiOutputRegressor. I downgraded sklearn from 1.6.1 to 1.5.2 and it worked.

polars==1.30.0 scikit-learn==1.5.2 lightgbm==4.5.0 (with 4.6.0 works as well)

lethnis avatar Jun 05 '25 08:06 lethnis

So from my understanding converting the polars dataframe to the underlying pyarrow table (which should be zero-copy for the majority) should work, right? Because in my case I also get this error then doing this (lightgbm 4.6.0, scikit-learn 1.6.0). However, downgrading one of the two solves that though. 😄

nejox avatar Jun 30 '25 08:06 nejox