skpro
skpro copied to clipboard
[BUG] `NGBoostRegressor` failing when `dist="TDistribution"`
Describe the bug
In the gradent_boosting
which has an interface of the NGBRegressor
in skpro
as NGBoostRegressor
the TDistribution
seems to be failing to run as expected. It is raising errors like
raise LinAlgError("Singular matrix")
numpy.linalg.LinAlgError: Singular matrix
To Reproduce
Upon using sklearn
's diabetes dataset and the breast_cancer dataset it is producing the same Singular Matrix
error. To reproduce
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from skpro.regression.gradient_boosting import NGBoostRegressor
# step 1: data specification
X, y = load_diabetes(return_X_y=True, as_frame=True)
X_train, X_test, Y_train, Y_test = train_test_split(X, y)
ngb = NGBoostRegressor(dist="TDistribution")._fit(X_train, Y_train)
Y_preds = ngb._predict(X_test)
Y_dists = ngb._pred_dist(X_test)
print(Y_dists)
Y_pred_proba = ngb.predict_proba(X_test)
print(Y_pred_proba)
# test Mean Squared Error
test_MSE = mean_squared_error(Y_preds, Y_test)
print('Test MSE', test_MSE)
# test Negative Log Likelihood
test_NLL = -Y_dists.logpdf(Y_test).mean()
print('Test NLL', test_NLL)
Expected behavior
The expected output must look something like this
[iter 0] loss=5.7260 val_loss=0.0000 scale=1.0000 norm=62.6096
[iter 100] loss=5.3862 val_loss=0.0000 scale=1.0000 norm=44.7994
[iter 200] loss=5.1347 val_loss=0.0000 scale=2.0000 norm=70.8354
[iter 300] loss=4.9709 val_loss=0.0000 scale=1.0000 norm=31.4283
[iter 400] loss=4.8448 val_loss=0.0000 scale=2.0000 norm=57.8725
<ngboost.distns.t.TDistribution object at 0x7a306649f010>
TDistribution(columns=Index(['target'], dtype='object'),
index=Index([394, 76, 398, 154, 164, 409, 86, 57, 248, 252,
...
337, 16, 115, 134, 158, 256, 315, 7, 292, 119],
dtype='int64', length=111),
mu= 0
0 204.242902
1 159.767290
2 180.299182
3 157.156834
4 132.029658
.. ...
106 207.598136
107 111.282266
108 142.690431
109 82.266164
110 144.789344
[111 rows x 1 columns],
sigma= 0
0 22.784403
1 26.722443
2 41.334656
3 32.130065
4 23.862477
.. ...
106 31.425179
107 33.441920
108 24.632183
109 26.791969
110 34.908296
[111 rows x 1 columns])
Test MSE 4077.414567879142
Test NLL 6.473540253400317
Environment
Python 3.11.8 ngboost 0.5.1
Additional context
The issue is to find out whether there is an issue with the interfacing ie the skpro
API or genuinely a bug in the ngboost
TDistribution
itself.