glum icon indicating copy to clipboard operation
glum copied to clipboard

Feature names aren't set if fit on `polars.DataFrame`

Open mlondschien opened this issue 11 months ago • 6 comments

Does glum support polars.DataFrames? I thought it does. It appears that feature names are lost somewhere:

import glum
import tabmat
import polars as pl
import numpy as np

df = pl.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
y = np.array([2, 2, 2])

glm = glum.GeneralizedLinearRegressor()
glm.fit(df, y)

print(f"feature_names_: {glm.feature_names_}")
print(f"feature_names_in_: {glm.feature_names_in_}")
print(f"coef_table:\n{glm.coef_table()}")
print()

glm.fit(tabmat.from_df(df), y)

print(f"feature_names_: {glm.feature_names_}")
print(f"feature_names_in_: {glm.feature_names_in_}")
print(f"coef_table:\n{glm.coef_table()}")

prints

feature_names_: ['_col_0', '_col_1']
feature_names_in_: ['a' 'b']
coef_table:
intercept    2.0
_col_0       0.0
_col_1       0.0
Name: coef, dtype: float64

feature_names_: [np.str_('a'), np.str_('b')]
feature_names_in_: ['a' 'b']
coef_table:
intercept    2.0
a            0.0
b            0.0
Name: coef, dtype: float64

This is using glum=3.1.0 and tabmat=4.1.0

mlondschien avatar Jan 04 '25 10:01 mlondschien

Not properly yet, but there is some ongoing work. Tabmat does support them already, though.

stanmart avatar Jan 04 '25 10:01 stanmart

Thanks @stanmart! I'll close the issue.

mlondschien avatar Jan 04 '25 11:01 mlondschien

How about opening this issue again or is it solved (by which PR then)?

lorentzenchr avatar Mar 18 '25 07:03 lorentzenchr

Sure, we can reopen this issue. The more general "problem" is that glum does not support (or claim to support) polars dataframes yet. The fact that they still kinda work with some minor issues is just a happy (?) coincidence.

Polars (and other dataframe type) support is coming soon, though, but we are waiting for some upstream features in formulaic.

stanmart avatar Mar 19 '25 10:03 stanmart

Can you reference the upstream issue in formulaic?

mlondschien avatar Mar 27 '25 17:03 mlondschien

The formulaic PR is already merged, so it's just a matter of waiting for the release now. There is a narwhals feature that would also help a lot with the implementation.

stanmart avatar Mar 31 '25 05:03 stanmart