glum
glum copied to clipboard
Several warnings and erros in coef_table
With glum version 2.6.0:
from glum import GeneralizedLinearRegressor
import pandas as pd
import numpy as np
X, y = pd.DataFrame({"x": np.arange(3)}), np.ones(3)
glm = GeneralizedLinearRegressor(family="poisson").fit(X, y)
glm.coef_table(X=X, y=y)
results in
python3.9/site-packages/glum/_glm.py:1909: UserWarning: Covariance matrix estimation assumes that the model is not penalized. You are estimating a penalized model. The covariance matrix will be incorrect.
warnings.warn(
python3.9/site-packages/glum/_util.py:37: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
if pd.api.types.is_categorical_dtype(dtype) and (column in df)
---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
Cell In[1], line 8
6 X, y = pd.DataFrame({"x": np.arange(3)}), np.ones(3)
7 glm = GeneralizedLinearRegressor(family="poisson").fit(X, y)
----> 8 glm.coef_table(X=X, y=y)
File python3.9/site-packages/glum/_glm.py:1397, in GeneralizedLinearRegressorBase.coef_table(self, confidence_level, X, y, mu, offset, sample_weight, dispersion, robust, clusters, expected_information)
1394 names = self.feature_names_
1395 beta = self.coef_
-> 1397 covariance_matrix = self.covariance_matrix(
1398 X=X,
1399 y=y,
1400 mu=mu,
1401 offset=offset,
1402 sample_weight=sample_weight,
1403 dispersion=dispersion,
1404 robust=robust,
1405 clusters=clusters,
1406 expected_information=expected_information,
1407 )
1409 significance_level = 1 - confidence_level
1411 std_errors = np.sqrt(np.diag(covariance_matrix))
File python3.9/site-packages/glum/_glm.py:1965, in GeneralizedLinearRegressorBase.covariance_matrix(self, X, y, mu, offset, sample_weight, dispersion, robust, clusters, expected_information, store_covariance_matrix, skip_checks)
1953 X, y = check_X_y_tabmat_compliant(
1954 X,
1955 y,
(...)
1961 drop_first=self.drop_first,
1962 )
1964 if isinstance(X, np.ndarray):
-> 1965 X = tm.DenseMatrix(X)
1966 if sparse.issparse(X) and not isinstance(X, tm.SparseMatrix):
1967 X = tm.SparseMatrix(X)
File python3.9/site-packages/tabmat/dense_matrix.py:44, in DenseMatrix.__new__(cls, input_array)
42 obj = np.asarray(input_array).view(cls)
43 if not np.issubdtype(obj.dtype, np.floating):
---> 44 raise NotImplementedError("DenseMatrix is only implemented for float data")
45 return obj
NotImplementedError: DenseMatrix is only implemented for float data
The first UserWarning seems like a false alarm: the GLM is unpenalized!
The FutureWarning is already fixed in main, #711.
The NotImplementedError comes unexpected.
The first UserWarning seems like a false alarm: the GLM is unpenalized!
Note that glum
defaults to a regularised GLM (ref). As @jtilly mentioned in #730, it isn't a good default.
The NotImplementedError comes unexpected.
That's a shortcoming in tabmat
: it gags when all types are integers. If the data frame contains any floating columns, the conversion succeeds. It should be easy to fix. Thanks for reporting! :)
Note that glum defaults to a regularised GLM (ref).
Should we change the default alpha in version 3 to alpha=0
?
Should we change the default alpha in version 3 to
alpha=0
?
There was a similar discussion for scikit-learn a few years ago: article, HN thread, There's also a bunch of reddit and stats stack exchange threads with people getting surprised by the default alpha.
Fix coming as part of Glum 3.