Feature names aren't set if fit on `polars.DataFrame`
Does glum support polars.DataFrames? I thought it does. It appears that feature names are lost somewhere:
import glum
import tabmat
import polars as pl
import numpy as np
df = pl.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
y = np.array([2, 2, 2])
glm = glum.GeneralizedLinearRegressor()
glm.fit(df, y)
print(f"feature_names_: {glm.feature_names_}")
print(f"feature_names_in_: {glm.feature_names_in_}")
print(f"coef_table:\n{glm.coef_table()}")
print()
glm.fit(tabmat.from_df(df), y)
print(f"feature_names_: {glm.feature_names_}")
print(f"feature_names_in_: {glm.feature_names_in_}")
print(f"coef_table:\n{glm.coef_table()}")
prints
feature_names_: ['_col_0', '_col_1']
feature_names_in_: ['a' 'b']
coef_table:
intercept 2.0
_col_0 0.0
_col_1 0.0
Name: coef, dtype: float64
feature_names_: [np.str_('a'), np.str_('b')]
feature_names_in_: ['a' 'b']
coef_table:
intercept 2.0
a 0.0
b 0.0
Name: coef, dtype: float64
This is using glum=3.1.0 and tabmat=4.1.0
Not properly yet, but there is some ongoing work. Tabmat does support them already, though.
Thanks @stanmart! I'll close the issue.
How about opening this issue again or is it solved (by which PR then)?
Sure, we can reopen this issue. The more general "problem" is that glum does not support (or claim to support) polars dataframes yet. The fact that they still kinda work with some minor issues is just a happy (?) coincidence.
Polars (and other dataframe type) support is coming soon, though, but we are waiting for some upstream features in formulaic.
Can you reference the upstream issue in formulaic?
The formulaic PR is already merged, so it's just a matter of waiting for the release now. There is a narwhals feature that would also help a lot with the implementation.