ngboost icon indicating copy to clipboard operation
ngboost copied to clipboard

Overflow warnings

Open ChristianMichelsen opened this issue 4 years ago • 18 comments

This package looks so promising!

I am just testing it out on my dataset with dimensions (N, M) = (57795, 144). At first I tried with N=100, N=1000, and N=10_000 and it worked well. Now I am trying to run it on all N=57_795 and I am encountering some overflow errors, see below. Is this something to be worried about?

[iter 0] loss=2.6377 val_loss=0.0000 scale=0.1250 norm=0.3378
~/miniconda3/envs/py37/lib/python3.7/site-packages/ngboost/distns/normal.py:13: RuntimeWarning:

overflow encountered in exp

~/miniconda3/envs/py37/lib/python3.7/site-packages/ngboost/distns/normal.py:14: RuntimeWarning:

overflow encountered in square

Cheers, Christian

ChristianMichelsen avatar Jan 10 '20 16:01 ChristianMichelsen

Hey @ChristianMichelsen, the issue looks like it's due to a large estimate of log(sigma), which then blows up when exponentiated. Could you try normalizing/min-max scaling your Y and let me know if you still get the error? If that doesn't fix it I'll need some more information about the data to be of any help.

alejandroschuler avatar Jan 10 '20 16:01 alejandroschuler

I'd also maybe try different subsets of the data to see if it's particular rows that blow it up.

alejandroschuler avatar Jan 10 '20 16:01 alejandroschuler

My Y is already in the range from 0.1 to 68, so that shouldn't be too bad, should it? Good idea about checking different subset, will return once I test that. Thanks!

ChristianMichelsen avatar Jan 13 '20 10:01 ChristianMichelsen

hey @ChristianMichelsen any update on this issue? I'd like to close it if you figured it out.

alejandroschuler avatar Jan 21 '20 01:01 alejandroschuler

I tried using only a subset of the data which helped (the errors only came up in the first 100 iterations and not afterwards), however, I am yet to try it on the whole dataset. I will return in case I run into more problems, so please feel free to close the issue.

And thanks again for an interesting new package!

ChristianMichelsen avatar Jan 22 '20 11:01 ChristianMichelsen

Ok, keep us updated!

alejandroschuler avatar Jan 22 '20 19:01 alejandroschuler

Hey! I am having the exact same issue, and I am not sure how to isolate the rows/samples causing this issue. My output label is min-max normalized from 0 to 1 as well, so I am not sure what's happening. Any help?

astrogilda avatar Sep 02 '20 07:09 astrogilda

Hey! I am having the exact same issue, and I am not sure how to isolate the rows/samples causing this issue. My output label is min-max normalized from 0 to 1 as well, so I am not sure what's happening. Any help?

Try updating the package to the latest release- we might have fixed this inadvertantly. Otherwise I'm not sure.

alejandroschuler avatar Sep 02 '20 15:09 alejandroschuler

Using the latest release. Still the same issue. Are there locations in the code where I can put print statements so I know exactly what is going on that's resulting in this warning?

On Wed, Sep 2, 2020, 11:38 AM Alejandro Schuler [email protected] wrote:

Hey! I am having the exact same issue, and I am not sure how to isolate the rows/samples causing this issue. My output label is min-max normalized from 0 to 1 as well, so I am not sure what's happening. Any help?

Try updating the package to the latest release- we might have fixed this inadvertantly. Otherwise I'm not sure.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/stanfordmlgroup/ngboost/issues/61#issuecomment-685819614, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFTOOHRRXSQ3PJN7KNWAXRLSDZRI7ANCNFSM4KFKM6JA .

astrogilda avatar Sep 02 '20 15:09 astrogilda

I also have a normalized dataset with target values in [0;1] I also get the overflow errors:

I get no errors in my first fit-predict round 0/59.. But so it begins: NGBRegressor_deltas_wtrained_scale30468.666666666668_fr012 0%| | 0/59 [00:00<?, ?it/s]/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:72: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). return f(**kwargs) [iter 0] loss=-1.0729 val_loss=0.0000 scale=1.0000 norm=0.7415 2%|▏ | 1/59 [01:01<59:14, 61.28s/it]/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:72: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). return f(**kwargs) /usr/local/lib/python3.8/dist-packages/ngboost/distns/normal.py:67: RuntimeWarning: overflow encountered in exp self.scale = np.exp(params[1]) /usr/local/lib/python3.8/dist-packages/ngboost/distns/normal.py:68: RuntimeWarning: overflow encountered in square self.var = self.scale ** 2 /usr/local/lib/python3.8/dist-packages/scipy/stats/_distn_infrastructure.py:1802: RuntimeWarning: divide by zero encountered in true_divide x = np.asarray((x - loc)/scale, dtype=dtyp) /usr/local/lib/python3.8/dist-packages/scipy/stats/_distn_infrastructure.py:1802: RuntimeWarning: overflow encountered in true_divide x = np.asarray((x - loc)/scale, dtype=dtyp) /usr/local/lib/python3.8/dist-packages/scipy/stats/_continuous_distns.py:247: RuntimeWarning: overflow encountered in square return -x**2 / 2.0 - _norm_pdf_logC [iter 0] loss=38494.0861 val_loss=0.0000 scale=0.0000 norm=0.5875 /usr/local/lib/python3.8/dist-packages/ngboost/distns/normal.py:67: RuntimeWarning: overflow encountered in exp self.scale = np.exp(params[1]) /usr/local/lib/python3.8/dist-packages/ngboost/distns/normal.py:68: RuntimeWarning: overflow encountered in square self.var = self.scale ** 2 /usr/local/lib/python3.8/dist-packages/scipy/stats/_distn_infrastructure.py:1802: RuntimeWarning: divide by zero encountered in true_divide x = np.asarray((x - loc)/scale, dtype=dtyp) /usr/local/lib/python3.8/dist-packages/scipy/stats/_distn_infrastructure.py:1802: RuntimeWarning: overflow encountered in true_divide x = np.asarray((x - loc)/scale, dtype=dtyp) /usr/local/lib/python3.8/dist-packages/scipy/stats/_continuous_distns.py:247: RuntimeWarning: overflow encountered in square return -x**2 / 2.0 - _norm_pdf_logC /usr/local/lib/python3.8/dist-packages/ngboost/distns/normal.py:67: RuntimeWarning: overflow encountered in exp self.scale = np.exp(params[1]) /usr/local/lib/python3.8/dist-packages/ngboost/distns/normal.py:68: RuntimeWarning: overflow encountered in square self.var = self.scale ** 2

CBach94 avatar Nov 08 '20 19:11 CBach94

I think this problem still persists even in the current release.


 [iter 0] loss=-1.6067 val_loss=0.0000 scale=1.0000 norm=0.4624
/home/ahmed/.local/lib/python3.8/site-packages/ngboost/distns/normal.py:71: RuntimeWarning: overflow encountered in square
  self.var = self.scale ** 2
/home/ahmed/.local/lib/python3.8/site-packages/ngboost/distns/normal.py:71: RuntimeWarning: overflow encountered in square
  self.var = self.scale ** 2
/home/ahmed/.local/lib/python3.8/site-packages/ngboost/distns/normal.py:71: RuntimeWarning: overflow encountered in square
  self.var = self.scale ** 2
/home/ahmed/.local/lib/python3.8/site-packages/ngboost/distns/normal.py:71: RuntimeWarning: overflow encountered in square
  self.var = self.scale ** 2
/home/ahmed/.local/lib/python3.8/site-packages/ngboost/distns/normal.py:70: RuntimeWarning: overflow encountered in exp
  self.scale = np.exp(params[1])
/home/ahmed/.local/lib/python3.8/site-packages/ngboost/distns/normal.py:71: RuntimeWarning: overflow encountered in square
  self.var = self.scale ** 2
/home/ahmed/.local/lib/python3.8/site-packages/ngboost/distns/normal.py:71: RuntimeWarning: overflow encountered in square
  self.var = self.scale ** 2
/home/ahmed/.local/lib/python3.8/site-packages/ngboost/distns/normal.py:71: RuntimeWarning: overflow encountered in square
  self.var = self.scale ** 2
/home/ahmed/.local/lib/python3.8/site-packages/ngboost/distns/normal.py:70: RuntimeWarning: overflow encountered in exp
  self.scale = np.exp(params[1])
/home/ahmed/.local/lib/python3.8/site-packages/ngboost/distns/normal.py:71: RuntimeWarning: overflow encountered in square
  self.var = self.scale ** 2
/home/ahmed/.local/lib/python3.8/site-packages/ngboost/distns/normal.py:70: RuntimeWarning: overflow encountered in exp
  self.scale = np.exp(params[1])
/home/ahmed/.local/lib/python3.8/site-packages/ngboost/distns/normal.py:71: RuntimeWarning: overflow encountered in square
  self.var = self.scale ** 2
/home/ahmed/.local/lib/python3.8/site-packages/ngboost/distns/normal.py:71: RuntimeWarning: overflow encountered in square
  self.var = self.scale ** 2
/home/ahmed/.local/lib/python3.8/site-packages/ngboost/distns/normal.py:71: RuntimeWarning: overflow encountered in square
  self.var = self.scale ** 2
/home/ahmed/.local/lib/python3.8/site-packages/ngboost/distns/normal.py:71: RuntimeWarning: overflow encountered in square
  self.var = self.scale ** 2
/home/ahmed/.local/lib/python3.8/site-packages/ngboost/distns/normal.py:70: RuntimeWarning: overflow encountered in exp
  self.scale = np.exp(params[1])
/home/ahmed/.local/lib/python3.8/site-packages/ngboost/distns/normal.py:71: RuntimeWarning: overflow encountered in square
  self.var = self.scale ** 2
/home/ahmed/.local/lib/python3.8/site-packages/ngboost/distns/normal.py:71: RuntimeWarning: overflow encountered in square
  self.var = self.scale ** 2
/home/ahmed/.local/lib/python3.8/site-packages/ngboost/distns/normal.py:70: RuntimeWarning: overflow encountered in exp
  self.scale = np.exp(params[1])
/home/ahmed/.local/lib/python3.8/site-packages/ngboost/distns/normal.py:71: RuntimeWarning: overflow encountered in square
  self.var = self.scale ** 2
/home/ahmed/.local/lib/python3.8/site-packages/ngboost/distns/normal.py:71: RuntimeWarning: overflow encountered in square
  self.var = self.scale ** 2
/home/ahmed/.local/lib/python3.8/site-packages/ngboost/distns/normal.py:71: RuntimeWarning: overflow encountered in square
  self.var = self.scale ** 2
/home/ahmed/.local/lib/python3.8/site-packages/ngboost/distns/normal.py:70: RuntimeWarning: overflow encountered in exp
  self.scale = np.exp(params[1])
/home/ahmed/.local/lib/python3.8/site-packages/ngboost/distns/normal.py:71: RuntimeWarning: overflow encountered in square
  self.var = self.scale ** 2
/home/ahmed/.local/lib/python3.8/site-packages/ngboost/distns/normal.py:71: RuntimeWarning: overflow encountered in square
  self.var = self.scale ** 2
/home/ahmed/.local/lib/python3.8/site-packages/ngboost/distns/normal.py:71: RuntimeWarning: overflow encountered in square
  self.var = self.scale ** 2
/home/ahmed/.local/lib/python3.8/site-packages/ngboost/distns/normal.py:71: RuntimeWarning: overflow encountered in square
  self.var = self.scale ** 2
/home/ahmed/.local/lib/python3.8/site-packages/ngboost/distns/normal.py:70: RuntimeWarning: overflow encountered in exp
  self.scale = np.exp(params[1])
/home/ahmed/.local/lib/python3.8/site-packages/ngboost/distns/normal.py:71: RuntimeWarning: overflow encountered in square
  self.var = self.scale ** 2
/home/ahmed/.local/lib/python3.8/site-packages/ngboost/distns/normal.py:70: RuntimeWarning: overflow encountered in exp
  self.scale = np.exp(params[1])
/home/ahmed/.local/lib/python3.8/site-packages/ngboost/distns/normal.py:71: RuntimeWarning: overflow encountered in square
  self.var = self.scale ** 2
/home/ahmed/.local/lib/python3.8/site-packages/ngboost/distns/normal.py:70: RuntimeWarning: overflow encountered in exp
  self.scale = np.exp(params[1])
/home/ahmed/.local/lib/python3.8/site-packages/ngboost/distns/normal.py:71: RuntimeWarning: overflow encountered in square
  self.var = self.scale ** 2
[iter 100] loss=-8.6189 val_loss=0.0000 scale=1.0000 norm=0.5774
/home/ahmed/.local/lib/python3.8/site-packages/ngboost/distns/normal.py:71: RuntimeWarning: overflow encountered in square
  self.var = self.scale ** 2
/home/ahmed/.local/lib/python3.8/site-packages/ngboost/distns/normal.py:70: RuntimeWarning: overflow encountered in exp
  self.scale = np.exp(params[1])
/home/ahmed/.local/lib/python3.8/site-packages/ngboost/distns/normal.py:71: RuntimeWarning: overflow encountered in square
  self.var = self.scale ** 2
[iter 100] loss=-7.8979 val_loss=0.0000 scale=1.0000 norm=0.6198
[iter 100] loss=-8.4725 val_loss=0.0000 scale=1.0000 norm=0.5463
/home/ahmed/.local/lib/python3.8/site-packages/ngboost/distns/normal.py:71: RuntimeWarning: overflow encountered in square
  self.var = self.scale ** 2
/home/ahmed/.local/lib/python3.8/site-packages/ngboost/distns/normal.py:71: RuntimeWarning: overflow encountered in square
  self.var = self.scale ** 2
/home/ahmed/.local/lib/python3.8/site-packages/ngboost/distns/normal.py:71: RuntimeWarning: overflow encountered in square
  self.var = self.scale ** 2
/home/ahmed/.local/lib/python3.8/site-packages/ngboost/distns/normal.py:71: RuntimeWarning: overflow encountered in square
  self.var = self.scale ** 2
/home/ahmed/.local/lib/python3.8/site-packages/ngboost/distns/normal.py:70: RuntimeWarning: overflow encountered in exp
  self.scale = np.exp(params[1])
/home/ahmed/.local/lib/python3.8/site-packages/ngboost/distns/normal.py:71: RuntimeWarning: overflow encountered in square
  self.var = self.scale ** 2
/home/ahmed/.local/lib/python3.8/site-packages/ngboost/distns/normal.py:71: RuntimeWarning: overflow encountered in square
  self.var = self.scale ** 2
/home/ahmed/.local/lib/python3.8/site-packages/ngboost/distns/normal.py:70: RuntimeWarning: overflow encountered in exp
  self.scale = np.exp(params[1])
[iter 100] loss=-8.9788 val_loss=0.0000 scale=1.0000 norm=0.5714
[iter 100] loss=-8.7467 val_loss=0.0000 scale=1.0000 norm=0.5202
/home/ahmed/.local/lib/python3.8/site-packages/ngboost/distns/normal.py:70: RuntimeWarning: overflow encountered in exp
  self.scale = np.exp(params[1])
/home/ahmed/.local/lib/python3.8/site-packages/ngboost/distns/normal.py:71: RuntimeWarning: overflow encountered in square
  self.var = self.scale ** 2
/home/ahmed/.local/lib/python3.8/site-packages/ngboost/distns/normal.py:71: RuntimeWarning: overflow encountered in square
  self.var = self.scale ** 2
[Parallel(n_jobs=-1)]: Done   5 out of  10 | elapsed:   22.7s remaining:   22.7s
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/ahmed/.local/lib/python3.8/site-packages/joblib/externals/loky/process_executor.py", line 431, in _process_worker
    r = call_item()
  File "/home/ahmed/.local/lib/python3.8/site-packages/joblib/externals/loky/process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "/home/ahmed/.local/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "/home/ahmed/.local/lib/python3.8/site-packages/joblib/parallel.py", line 262, in __call__
    return [func(*args, **kwargs)
  File "/home/ahmed/.local/lib/python3.8/site-packages/joblib/parallel.py", line 262, in <listcomp>
    return [func(*args, **kwargs)
  File "/home/ahmed/.local/lib/python3.8/site-packages/sklearn/model_selection/_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/home/ahmed/.local/lib/python3.8/site-packages/ngboost/ngboost.py", line 255, in fit
    grads = D.grad(Y_batch, natural=self.natural_gradient)
  File "/home/ahmed/.local/lib/python3.8/site-packages/ngboost/scores.py", line 12, in grad
    grad = np.linalg.solve(metric, grad)
  File "<__array_function__ internals>", line 5, in solve
  File "/home/ahmed/.local/lib/python3.8/site-packages/numpy/linalg/linalg.py", line 394, in solve
    r = gufunc(a, b, signature=signature, extobj=extobj)
  File "/home/ahmed/.local/lib/python3.8/site-packages/numpy/linalg/linalg.py", line 88, in _raise_linalgerror_singular
    raise LinAlgError("Singular matrix")
numpy.linalg.LinAlgError: Singular matrix
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ahmed/.local/lib/python3.8/site-packages/skopt/searchcv.py", line 691, in fit
    optim_result = self._step(
  File "/home/ahmed/.local/lib/python3.8/site-packages/skopt/searchcv.py", line 578, in _step
    self._fit(X, y, groups, params_dict)
  File "/home/ahmed/.local/lib/python3.8/site-packages/skopt/searchcv.py", line 409, in _fit
    out = Parallel(
  File "/home/ahmed/.local/lib/python3.8/site-packages/joblib/parallel.py", line 1054, in __call__
    self.retrieve()
  File "/home/ahmed/.local/lib/python3.8/site-packages/joblib/parallel.py", line 933, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "/home/ahmed/.local/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
numpy.linalg.LinAlgError: Singular matrix 

athammad avatar Feb 19 '21 04:02 athammad

@tonyduan do you think this might be due to a combination of the log(scale) parametrization and the scale-up part of the line search? Maybe we could try implementing a different line search algorithm?

alejandroschuler avatar Feb 19 '21 04:02 alejandroschuler

@tonyduan do you think this might be due to a combination of the log(scale) parametrization and the scale-up part of the line search? Maybe we could try implementing a different line search algorithm?

@alejandroschuler it's possible, though without having a closer look at the dataset and debugging it's hard to say for sure.

@athammad can you try normalizing your outputs to zero mean and unit variance? And if possible, it'd be very helpful if you could attach matrices (X, y) into this issue so that we can reproduce.

tonyduan avatar Feb 21 '21 22:02 tonyduan

Dear @tonyduan,

Here are my dataset and the code that I have been using. As you will notice, I am using Bayesian optimization trough scikit-optimize. Please note the following points: 1)I encountered the same error even with the Boston dataset, but I cannot reproduce it. 2) I have encountered another error ValueError: Input contains NaN, infinity or a value too large for dtype('float64'). 3)Even with my data, It might take a while before seeing the error(s).

Also, I would appreciate some clarification regarding why normalizing the output to zero mean and unit variance could help and its relation with a tree-based algorithm. In my understanding, it's not needed.

from ngboost import NGBRegressor
import numpy as np
import pandas as pd
from sklearn.metrics import r2_score,mean_squared_error
from ngboost.distns import Exponential, Normal,LogNormal
from ngboost.scores import LogScore, CRPScore
from sklearn.tree import DecisionTreeRegressor
from sklearn.base import clone
from ngboost.learners import default_tree_learner, default_linear_learner
import math

NITER=10
CV=5

#Set a seed value
seed_value= 12321 
#1. Set `PYTHONHASHSEED` environment variable at a fixed value
import os
os.environ['PYTHONHASHSEED']=str(seed_value)
#2. Set `python` built-in pseudo-random generator at a fixed value
import random
random.seed(seed_value)
#3. Set `numpy` pseudo-random generator at a fixed value
np.random.seed(seed_value)

#Load Data
import datatable as dt
TrainDataTS= dt.fread('TrainDataTS.csv')
testDataTS= dt.fread('testDataTS.csv')
TrainDataTS.head(5)
X_train=TrainDataTS[:,1:].to_numpy()
Y_train=np.ravel(TrainDataTS[:,'EVI'])


 base_models = [
            DecisionTreeRegressor(criterion='friedman_mse', max_depth=i)
            for i in range(2, 11)
        ]

from skopt import BayesSearchCV
from skopt.space import Real, Categorical, Integer
        
param_dists = {
    'Base': Categorical(base_models),
    'n_estimators': Integer(100,3000),
    'learning_rate': Real(0.0001,1.0,'log-uniform'),
    'col_sample': Real(0.5,1.0),
    'minibatch_frac': Real(0.5,1.0),
    'Dist': Categorical([Normal, LogNormal]),
    'Score': Categorical([CRPScore,LogScore])
}


ngb = NGBRegressor(verbose=True, random_state=seed_value)

by_search=BayesSearchCV(ngb, param_dists, n_iter=NITER,random_state=seed_value,
                                    scoring="neg_root_mean_squared_error", n_jobs=-1, cv=CV, verbose=2,refit=True)


by_search.fit(X_train, Y_train)


TrainData.zip Cheers, Ahmed

athammad avatar Feb 25 '21 11:02 athammad

@tonyduan maybe we should auto-scale user data as an internal preprocessing step

alejandroschuler avatar Feb 25 '21 17:02 alejandroschuler

@tonyduan maybe we should auto-scale user data as an internal preprocessing step

It sounds like you know the reason for these errors? Can you please share? We also receive them, but the target is scaled.

cosmin-novac avatar Apr 13 '22 07:04 cosmin-novac

@tonyduan maybe we should auto-scale user data as an internal preprocessing step

It sounds like you know the reason for these errors? Can you please share? We also receive them, but the target is scaled.

It's still not clear to me. My current hypothesis is that it's the upscaling in the line search part of the algorithm. An easy way to test that would be to cap the scaling and see if it solves the problem and doesn't break anything else too badly.

The line search is an interchangeable part of the algorithm and doesn't need to be implemented as it currently is (backtracking line search). See (pg 5 of ngboost paper):

The output of the fitted base learner is the projection of the natural gradient on to the range of the base learner class. This projected gradient is then scaled by a scaling factor ρ since local approximations might not hold true very far away from the current parameter position. The scaling factor is chosen to minimize the overall true scoring rule loss along the direction of the projected gradient in the form of a line search. In practice, we found that implementing this line search by successive halving of ρ (starting with ρ = 1) until the scaled gradient update results in a lower overall loss relative to the previous iteration works reasonably well and is easy to implement.

I don't have any time to test and update this myself but pull requests are welcome.

alejandroschuler avatar Apr 13 '22 22:04 alejandroschuler

Quite interestingly, this issue appears quite randomly for me. I did scaled the output variables (they are quite highly correlated) but on a run it may work and on the next run i get a numpy.linalg.LinAlgError: Singular matrix error. I do get overflow warnings all the time though.

tim-habitat avatar Dec 16 '22 12:12 tim-habitat