xgboost-distribution icon indicating copy to clipboard operation
xgboost-distribution copied to clipboard

"Singular matrix" error when I use normal distribution or negative-binomial

Open hrkadkhodaei opened this issue 3 years ago • 9 comments

When I run the following code snipper I get an error "numpy.linalg.LinAlgError: Singular matrix" X_train, y_train, X_test, y_test = read_data(InEx) model = XGBDistribution(distribution="normal", n_estimators=500) model.fit(X_train, y_train, eval_set=[(X_test, y_test)], early_stopping_rounds=10)

The full error:

`D:\Python37\lib\site-packages\xgboost_distribution\distributions\normal.py:89: RuntimeWarning: overflow encountered in exp D:\Python37\lib\site-packages\xgboost_distribution\distributions\normal.py:61: RuntimeWarning: overflow encountered in exp Traceback (most recent call last): File "D:\Python37\lib\contextlib.py", line 130, in exit self.gen.throw(type, value, traceback) File "D:\Python37\lib\site-packages\xgboost\config.py", line 140, in config_context yield File "D:\Python37\lib\site-packages\xgboost_distribution\model.py", line 181, in fit callbacks=callbacks, File "D:\Python37\lib\site-packages\xgboost\training.py", line 196, in train early_stopping_rounds=early_stopping_rounds) File "D:\Python37\lib\site-packages\xgboost\training.py", line 81, in _train_internal bst.update(dtrain, i, obj) File "D:\Python37\lib\site-packages\xgboost\core.py", line 1685, in update grad, hess = fobj(pred, dtrain) File "D:\Python37\lib\site-packages\xgboost_distribution\model.py", line 254, in obj y=y, params=params, natural_gradient=self.natural_gradient File "D:\Python37\lib\site-packages\xgboost_distribution\distributions\normal.py", line 72, in gradient_and_hessian grad = np.linalg.solve(fisher_matrix, grad) File "<array_function internals>", line 6, in solve File "D:\Python37\lib\site-packages\numpy\linalg\linalg.py", line 394, in solve r = gufunc(a, b, signature=signature, extobj=extobj) File "D:\Python37\lib\site-packages\numpy\linalg\linalg.py", line 88, in _raise_linalgerror_singular raise LinAlgError("Singular matrix") numpy.linalg.LinAlgError: Singular matrix

Process finished with exit code 1 ` The training and test data contain 13 float features (X) and 1 integer target (y)

hrkadkhodaei avatar Feb 03 '22 11:02 hrkadkhodaei

I get the same error as above when fitting a 430k rows dataset with 31 columns, but the same dataset scaled down to 43k rows works.

aleksaw avatar Nov 09 '22 13:11 aleksaw

I got the same error. I tried to set the parameter natural_gradient as False to omit line grad = np.linalg.solve(fisher_matrix, grad) that is causing this, but then appears this warning:

C:\Users\scyperski\Anaconda3\envs\cost_prediction\lib\site-packages\xgboost_distribution\distributions\normal.py:65: RuntimeWarning: divide by zero encountered in divide grad[:, 0] = (loc - y) / var C:\Users\scyperski\Anaconda3\envs\cost_prediction\lib\site-packages\xgboost_distribution\distributions\normal.py:66: RuntimeWarning: divide by zero encountered in divide grad[:, 1] = 1 - ((y - loc) ** 2) / var C:\Users\scyperski\Anaconda3\envs\cost_prediction\lib\site-packages\xgboost_distribution\distributions\normal.py:78: RuntimeWarning: divide by zero encountered in divide hess[:, 0] = 1 / var C:\Users\scyperski\Anaconda3\envs\cost_prediction\lib\site-packages\xgboost_distribution\distributions\normal.py:79: RuntimeWarning: divide by zero encountered in divide hess[:, 1] = 2 * ((y - loc) ** 2) / var

As a result the predictions are full of NaNs. It seems like it's caused by log_scale array (in gradient_and_hessian method) which elements are to small and rounded to 0 after: var = np.exp(2 * log_scale)

As a workaround I added this line before calculating the exponential: log_scale = np.clip(log_scale, -20, 20)

So far it works even with natural_gradient parameter as True.

CyperStone avatar Jan 27 '23 15:01 CyperStone

Hi, Thanks for raising / debugging. Does anyone have an example data set / method of fitting where this happens?

CDonnerer avatar Feb 13 '23 19:02 CDonnerer

I got the permission from my workplace to share a sample dataset after its anonymization. I also prepared a minimal code snippet to reproduce this problem. Please contact me at [email protected] (dataset is quite heavy).

CyperStone avatar Feb 22 '23 13:02 CyperStone

Thanks, appreciate this. I've got a slight preference for finding a public dataset, just so it's easier to add to the test suite, so I'll have look at this first and get back to you if I can't reproduce.

CDonnerer avatar Mar 06 '23 21:03 CDonnerer

Okay, I was able to reproduce the error with some datasets and merged a fix (#86) which is available in the latest release (xgboost-distribution==0.2.7). However, depending on the data, there could still be issues here, so please let me know if this error still occurs.

CDonnerer avatar Mar 12 '23 16:03 CDonnerer

Still got the same issue with negative-binomial. If this is still being maintained, let me know and I'll get an MRE together.

jackguac avatar Jun 11 '24 15:06 jackguac

I've similarly found that the size of the dataset makes a difference. Up to about 40k rows is fine, above that the error occurs. It doesn't seem related to the contents of the dataset (e.g. for a 1M row dataset, all the 40k chunks are independently fine, but passed in together cause the error)

jackguac avatar Jun 11 '24 18:06 jackguac

Yes, it is still maintained. Do you have any details on the error that you're seeing (or data for reproducible example)? The above was related to numeric overflow errors, so if that's the issue, it may just need safer limits for negative-binomial.

CDonnerer avatar Jun 12 '24 20:06 CDonnerer