glum icon indicating copy to clipboard operation
glum copied to clipboard

Several warnings and erros in coef_table

Open lorentzenchr opened this issue 1 year ago • 4 comments

With glum version 2.6.0:

from glum import GeneralizedLinearRegressor
import pandas as pd
import numpy as np


X, y = pd.DataFrame({"x": np.arange(3)}), np.ones(3)
glm = GeneralizedLinearRegressor(family="poisson").fit(X, y)
glm.coef_table(X=X, y=y)

results in

python3.9/site-packages/glum/_glm.py:1909: UserWarning: Covariance matrix estimation assumes that the model is not penalized. You are estimating a penalized model. The covariance matrix will be incorrect.
  warnings.warn(
python3.9/site-packages/glum/_util.py:37: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if pd.api.types.is_categorical_dtype(dtype) and (column in df)

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
Cell In[1], line 8
      6 X, y = pd.DataFrame({"x": np.arange(3)}), np.ones(3)
      7 glm = GeneralizedLinearRegressor(family="poisson").fit(X, y)
----> 8 glm.coef_table(X=X, y=y)

File python3.9/site-packages/glum/_glm.py:1397, in GeneralizedLinearRegressorBase.coef_table(self, confidence_level, X, y, mu, offset, sample_weight, dispersion, robust, clusters, expected_information)
   1394     names = self.feature_names_
   1395     beta = self.coef_
-> 1397 covariance_matrix = self.covariance_matrix(
   1398     X=X,
   1399     y=y,
   1400     mu=mu,
   1401     offset=offset,
   1402     sample_weight=sample_weight,
   1403     dispersion=dispersion,
   1404     robust=robust,
   1405     clusters=clusters,
   1406     expected_information=expected_information,
   1407 )
   1409 significance_level = 1 - confidence_level
   1411 std_errors = np.sqrt(np.diag(covariance_matrix))

File python3.9/site-packages/glum/_glm.py:1965, in GeneralizedLinearRegressorBase.covariance_matrix(self, X, y, mu, offset, sample_weight, dispersion, robust, clusters, expected_information, store_covariance_matrix, skip_checks)
   1953 X, y = check_X_y_tabmat_compliant(
   1954     X,
   1955     y,
   (...)
   1961     drop_first=self.drop_first,
   1962 )
   1964 if isinstance(X, np.ndarray):
-> 1965     X = tm.DenseMatrix(X)
   1966 if sparse.issparse(X) and not isinstance(X, tm.SparseMatrix):
   1967     X = tm.SparseMatrix(X)

File python3.9/site-packages/tabmat/dense_matrix.py:44, in DenseMatrix.__new__(cls, input_array)
     42 obj = np.asarray(input_array).view(cls)
     43 if not np.issubdtype(obj.dtype, np.floating):
---> 44     raise NotImplementedError("DenseMatrix is only implemented for float data")
     45 return obj

NotImplementedError: DenseMatrix is only implemented for float data

The first UserWarning seems like a false alarm: the GLM is unpenalized!

The FutureWarning is already fixed in main, #711.

The NotImplementedError comes unexpected.

lorentzenchr avatar Nov 05 '23 10:11 lorentzenchr

The first UserWarning seems like a false alarm: the GLM is unpenalized!

Note that glum defaults to a regularised GLM (ref). As @jtilly mentioned in #730, it isn't a good default.

lbittarello avatar Nov 06 '23 09:11 lbittarello

The NotImplementedError comes unexpected.

That's a shortcoming in tabmat: it gags when all types are integers. If the data frame contains any floating columns, the conversion succeeds. It should be easy to fix. Thanks for reporting! :)

lbittarello avatar Nov 06 '23 09:11 lbittarello

Note that glum defaults to a regularised GLM (ref).

Should we change the default alpha in version 3 to alpha=0?

Should we change the default alpha in version 3 to alpha=0?

There was a similar discussion for scikit-learn a few years ago: article, HN thread, There's also a bunch of reddit and stats stack exchange threads with people getting surprised by the default alpha.

stanmart avatar Jan 23 '24 09:01 stanmart

Fix coming as part of Glum 3.

lbittarello avatar Apr 03 '24 14:04 lbittarello