pmdarima icon indicating copy to clipboard operation
pmdarima copied to clipboard

Error: Input contains NaN, infinity or a value too large for dtype('float64'): pmdarima.predict()

Open joshi-abhishek opened this issue 4 years ago • 22 comments

Describe the bug The method abruptly exit with the below error... ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

But The data is clean and no sign of any reported behavior above.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-56-cb145de96983> in <module>
      4 model_arima = auto_arima(data_tra, start_p = 0, start_q = 0, max_p = 12, max_q = 12, m = 12, start_P = 0, start_Q = 0, seasonal = False, error_action = 'ignore', suppress_warnings = True, stepwise = True)
----> 6 forecast_arima = model_arima.predict(n_periods = 18, return_conf_int = True, alpha = 0.05)

/opt/anaconda/envs/shared/lib/python3.7/site-packages/pmdarima/arima/arima.py in predict(self, n_periods, exogenous, return_conf_int, alpha)
    651             end=end,
    652             exog=exogenous,
--> 653             alpha=alpha)
    654 
    655         if return_conf_int:

/opt/anaconda/envs/shared/lib/python3.7/site-packages/pmdarima/arima/arima.py in _seasonal_prediction_with_confidence(arima_res, start, end, exog, alpha, **kwargs)
     81     conf_int = results.conf_int(alpha=alpha)
     82     return check_endog(f, dtype=None, copy=False), \
---> 83         check_array(conf_int, copy=False, dtype=None)
     84 
     85 

/opt/anaconda/envs/shared/lib/python3.7/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
     70                           FutureWarning)
     71         kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 72         return f(**kwargs)
     73     return inner_f
     74 

/opt/anaconda/envs/shared/lib/python3.7/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
    643         if force_all_finite:
    644             _assert_all_finite(array,
--> 645                                allow_nan=force_all_finite == 'allow-nan')
    646 
    647     if ensure_min_samples > 0:

/opt/anaconda/envs/shared/lib/python3.7/site-packages/sklearn/utils/validation.py in _assert_all_finite(X, allow_nan, msg_dtype)
     97                     msg_err.format
     98                     (type_err,
---> 99                      msg_dtype if msg_dtype is not None else X.dtype)
    100             )
    101     # for object dtype data, we only check for NaNs (GH-13254)

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
-----------------------------------------------------------------------------------------------------

To Reproduce
Steps to reproduce the behavior:

data:

[1872.0, 1452.0, 1476.0, 1404.0, 3048.0, 1788.0, 1080.0, 888.0, 2184.0, 2220.0, 1680.0,
612.0, 2124.0, 486.0, 1968.0, 924.0, 888.0, 1756.0, 1104.0, 876.0, 888.0, 1608.0, 1896.0,
648.0, 1524.0, 804.0, 816.0, 1944.0, 1512.0, 900.0, 1464.0, 876.0, 1464.0, 2136.0, 732.0, 
1764.0, 840.0, 1860.0, 792.0, 1728.0, 768.0, 1080.0, 876.0, 1716.0, 900.0, 1740.0, 888.0, 
2172.0, 486.0]

Code:

from pmdarima.arima import auto_arima

model_arima = auto_arima(data, start_p = 0, start_q = 0, max_p = 12, max_q = 12, m = 12, start_P = 0, start_Q = 0, seasonal = False, error_action = 'ignore', suppress_warnings = True, stepwise = True)
forecast_arima = model_arima.predict(n_periods = 18, return_conf_int = False, alpha = 0.05)

Versions

import pmdarima; pmdarima.show_versions()

System:
    python: 3.7.9 (default, Aug 31 2020, 12:42:55)  [GCC 7.3.0]
executable: /opt/anaconda/envs/shared/bin/python
   machine: Linux-4.4.0-1114-aws-x86_64-with-debian-stretch-sid

Python dependencies:
        pip: 20.2.3
 setuptools: 49.6.0.post20200917
    sklearn: 0.23.2
statsmodels: 0.12.0
      numpy: 1.19.1
      scipy: 1.5.2
     Cython: 0.29.21
     pandas: 0.25.3
     joblib: 0.16.0
   pmdarima: 1.7.1

Expected behavior There should be no error.

Actual behavior

Additional context

joshi-abhishek avatar Dec 07 '20 07:12 joshi-abhishek

Can you try updating your version? This works on 1.8.0:

In [1]: data = [1872.0, 1452.0, 1476.0, 1404.0, 3048.0, 1788.0, 1080.0, 888.0, 2184.0, 2220.0, 1680.0,
   ...: 612.0, 2124.0, 486.0, 1968.0, 924.0, 888.0, 1756.0, 1104.0, 876.0, 888.0, 1608.0, 1896.0,
   ...: 648.0, 1524.0, 804.0, 816.0, 1944.0, 1512.0, 900.0, 1464.0, 876.0, 1464.0, 2136.0, 732.0,
   ...: 1764.0, 840.0, 1860.0, 792.0, 1728.0, 768.0, 1080.0, 876.0, 1716.0, 900.0, 1740.0, 888.0,
   ...: 2172.0, 486.0]

In [2]: from pmdarima.arima import auto_arima
   ...:
   ...: model_arima = auto_arima(data, start_p = 0, start_q = 0, max_p = 12, max_q = 12, m = 12, start_P = 0, start_Q = 0, seasonal = False, error_action = 'ignore', suppress_warnings = True, stepwise = True)
   ...: forecast_arima = model_arima.predict(n_periods = 18, return_conf_int = False, alpha = 0.05)

In [3]: forecast_arima
Out[3]:
array([1742.03281905, 1038.44297599, 1677.5632002 , 1122.01177781,
       1504.58931217, 1021.85945799, 1588.49173444, 1202.38369947,
       1480.27656245, 1170.41755339, 1407.33114539, 1250.95355177,
       1452.51653705, 1248.930108  , 1375.22988857, 1258.86745029,
       1391.23966826, 1303.01297922])

pip install --upgrade pmdarima

tgsmith61591 avatar Dec 07 '20 14:12 tgsmith61591

Its the same error even after the pmdarima upgrade..

import pmdarima; pmdarima.show_versions()

System:
    python: 3.7.9 (default, Aug 31 2020, 12:42:55)  [GCC 7.3.0]
executable: /opt/anaconda/envs/shared/bin/python
   machine: Linux-4.4.0-1114-aws-x86_64-with-debian-stretch-sid

Python dependencies:
        pip: 20.2.3
 setuptools: 49.6.0.post20200917
    sklearn: 0.23.2
statsmodels: 0.12.1
      numpy: 1.19.1
      scipy: 1.5.2
     Cython: 0.29.17
     pandas: 0.25.3
     joblib: 0.16.0
   pmdarima: 1.8.0

Error

    380     model_arima = auto_arima(data_tra, start_p = 0, start_q = 0, max_p = 12, max_q = 12, m = 12, start_P = 0, start_Q = 0, seasonal = False, error_action = 'ignore', suppress_warnings = True)
--> 381     forecast_arima = model_arima.predict(n_periods = len(tes), return_conf_int = False, alpha = ci_alpha)

/opt/anaconda/envs/shared/lib/python3.7/site-packages/sklearn/utils/validation.py in _assert_all_finite(X, allow_nan, msg_dtype)
     97                     msg_err.format
     98                     (type_err,
---> 99                      msg_dtype if msg_dtype is not None else X.dtype)
    100             )
    101     # for object dtype data, we only check for NaNs (GH-13254)

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

Could you let me know your supporting libraries versions as well with pmdarima.show_versions()? I read somewhere the pandas & statsmodel versions also matters.

joshi-abhishek avatar Dec 07 '20 15:12 joshi-abhishek

In [3]: pm.show_versions()

System:
    python: 3.7.9 (default, Nov 18 2020, 14:10:47)  [GCC 8.3.0]
executable: /usr/local/bin/python
   machine: Linux-5.4.39-linuxkit-x86_64-with-debian-10.6

Python dependencies:
        pip: 20.3.1
 setuptools: 50.3.2
    sklearn: 0.23.2
statsmodels: 0.12.1
      numpy: 1.19.4
      scipy: 1.5.4
     Cython: 0.29.17
     pandas: 1.1.5
     joblib: 0.17.0
   pmdarima: 1.8.0

Keep in mind if you're having environmental issues, you can always use the docker image, and mount a volume wherever you want to save your model:

$ docker run --rm -it alkalineml/pmdarima:latest

tgsmith61591 avatar Dec 07 '20 17:12 tgsmith61591

This still an issue @joshi-abhishek ?

tgsmith61591 avatar Dec 10 '20 19:12 tgsmith61591

Yes.. I am trying out in different machines to check if this is actually an issue..and then we'd have a root cause identified.

joshi-abhishek avatar Dec 14 '20 08:12 joshi-abhishek

I'm facing a similar issue with data that looks like this

test = [53930.25, 16575.5, 15593.1, 6751.15, 5408.95, 3853.0, 5119.9, 6761.55, 20449.1, 20458.05, 24501.8, 33300.4, 34285.9] 
arima_model = auto_arima(test)
arima_model.predict(n_periods=1)

The error message:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-264-f624bf4d9f84> in <module>
      1 test = [53930.25, 16575.5, 15593.1, 6751.15, 5408.95, 3853.0, 5119.9, 6761.55, 20449.1, 20458.05, 24501.8, 33300.4, 34285.9]
      2 arima_model = auto_arima(test)
----> 3 arima_model.predict(n_periods=1)

~\miniconda3\envs\arima\lib\site-packages\pmdarima\arima\arima.py in predict(self, n_periods, X, return_conf_int, alpha, **kwargs)
    674         end = arima.nobs + n_periods - 1
    675 
--> 676         f, conf_int = _seasonal_prediction_with_confidence(
    677             arima_res=arima,
    678             start=arima.nobs,

~\miniconda3\envs\arima\lib\site-packages\pmdarima\arima\arima.py in _seasonal_prediction_with_confidence(arima_res, start, end, X, alpha, **kwargs)
     86     conf_int = results.conf_int(alpha=alpha)
     87     return check_endog(f, dtype=None, copy=False), \
---> 88         check_array(conf_int, copy=False, dtype=None)
     89 
     90 

~\miniconda3\envs\arima\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
     61             extra_args = len(args) - len(all_args)
     62             if extra_args <= 0:
---> 63                 return f(*args, **kwargs)
     64 
     65             # extra_args > 0

~\miniconda3\envs\arima\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
    718 
    719         if force_all_finite:
--> 720             _assert_all_finite(array,
    721                                allow_nan=force_all_finite == 'allow-nan')
    722 

~\miniconda3\envs\arima\lib\site-packages\sklearn\utils\validation.py in _assert_all_finite(X, allow_nan, msg_dtype)
    101                 not allow_nan and not np.isfinite(X).all()):
    102             type_err = 'infinity' if allow_nan else 'NaN, infinity'
--> 103             raise ValueError(
    104                     msg_err.format
    105                     (type_err,

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

Versions used:

System:
    python: 3.8.10 (default, May 19 2021, 13:12:57) [MSC v.1916 64 bit (AMD64)]
executable: C:\Users\shuvo\miniconda3\envs\arima\python.exe
   machine: Windows-10-10.0.19042-SP0

Python dependencies:
        pip: 21.1.1
 setuptools: 52.0.0.post20210125
    sklearn: 0.24.2
statsmodels: 0.12.2
      numpy: 1.19.5
      scipy: 1.6.3
     Cython: 0.29.23
     pandas: 1.2.4
     joblib: 1.0.1
   pmdarima: 1.8.2

Shuvo-saha avatar Jun 07 '21 20:06 Shuvo-saha

The trace looks like this:

Performing stepwise search to minimize aic
 ARIMA(2,0,2)(0,0,0)[0] intercept   : AIC=inf, Time=0.13 sec
 ARIMA(0,0,0)(0,0,0)[0] intercept   : AIC=289.456, Time=0.01 sec
 ARIMA(1,0,0)(0,0,0)[0] intercept   : AIC=286.625, Time=0.03 sec
 ARIMA(0,0,1)(0,0,0)[0] intercept   : AIC=292.002, Time=0.01 sec
 ARIMA(0,0,0)(0,0,0)[0]             : AIC=300.810, Time=0.00 sec
 ARIMA(2,0,0)(0,0,0)[0] intercept   : AIC=289.358, Time=0.06 sec
 ARIMA(1,0,1)(0,0,0)[0] intercept   : AIC=289.564, Time=0.06 sec
 ARIMA(2,0,1)(0,0,0)[0] intercept   : AIC=193.418, Time=0.21 sec
 ARIMA(3,0,1)(0,0,0)[0] intercept   : AIC=inf, Time=0.15 sec
 ARIMA(1,0,2)(0,0,0)[0] intercept   : AIC=inf, Time=0.10 sec
 ARIMA(3,0,0)(0,0,0)[0] intercept   : AIC=288.315, Time=0.11 sec
 ARIMA(3,0,2)(0,0,0)[0] intercept   : AIC=inf, Time=0.15 sec
 ARIMA(2,0,1)(0,0,0)[0]             : AIC=inf, Time=0.06 sec

Best model:  ARIMA(2,0,1)(0,0,0)[0] intercept
Total fit time: 1.097 seconds

Shuvo-saha avatar Jun 07 '21 20:06 Shuvo-saha

@Shuvo-saha I get a different model with your data, and cannot reproduce the error:

In [5]: test = [53930.25, 16575.5, 15593.1, 6751.15, 5408.95, 3853.0, 5119.9, 6761.55, 20449.1, 20458.05, 24501.8, 33300.4, 34285.9]
   ...: arima_model = auto_arima(test, trace=True)
   ...: arima_model.predict(n_periods=1)
Performing stepwise search to minimize aic
 ARIMA(2,0,2)(0,0,0)[0] intercept   : AIC=inf, Time=0.11 sec
 ARIMA(0,0,0)(0,0,0)[0] intercept   : AIC=289.456, Time=0.00 sec
 ARIMA(1,0,0)(0,0,0)[0] intercept   : AIC=286.625, Time=0.03 sec
 ARIMA(0,0,1)(0,0,0)[0] intercept   : AIC=292.002, Time=0.01 sec
 ARIMA(0,0,0)(0,0,0)[0]             : AIC=300.810, Time=0.00 sec
 ARIMA(2,0,0)(0,0,0)[0] intercept   : AIC=289.358, Time=0.05 sec
 ARIMA(1,0,1)(0,0,0)[0] intercept   : AIC=289.564, Time=0.05 sec
 ARIMA(2,0,1)(0,0,0)[0] intercept   : AIC=inf, Time=0.06 sec
 ARIMA(1,0,0)(0,0,0)[0]             : AIC=287.546, Time=0.02 sec

Best model:  ARIMA(1,0,0)(0,0,0)[0] intercept
Total fit time: 0.345 seconds

Out[5]: array([31676.81437161])

tgsmith61591 avatar Jul 22 '21 21:07 tgsmith61591

Hi @joshi-abhishek, Is your issue resolved?

aakashparsi avatar Aug 18 '21 11:08 aakashparsi

I am currently facing this exact issue. Did you ever manage to resolve this? @joshi-abhishek @aakashparsi

shanemcquillan1994 avatar Sep 09 '21 11:09 shanemcquillan1994

The problem happens due to extremely large errors when the autoARIMA can't find a good solution

Shuvo-saha avatar Sep 09 '21 11:09 Shuvo-saha

The problem happens due to extremely large errors when the autoARIMA can't find a good solution

yes, maybe you are right my get wrong series look like this. [41.0, 65.0, 80.0, 67.0, 49.0, 53.0, 54.0, 61.0, 36.0, 40.0, 37.0, 48.0, 40.0, 37.0, 32.0, 40.0, 41.0, 28.0, 37.0, 37.0, 29.0, 25.0, 46.0, 28.0, 41.0, 42.0, 87.0, 106.0, 64.0, 0, 17.0, 28.0, 31.0, 44.0, 38.0, 29.0, 42.0, 16.0, 34.0, 69.0, 64.0, 29.0, 55.0, 62.0, 68.0, 52.0, 42.0, 41.0, 40.0, 42.0, 37.0, 43.0, 62.0, 55.0, 62.0, 66.0, 94.0, 82.0, 88.0, 50.0, 2.0] [i can't paste a picture ,this is the data(weekly)]

zenoprod avatar Sep 18 '21 07:09 zenoprod

so the solution is not to use autoarima if the series is difficult to forecast?

vinson2233 avatar Nov 23 '21 04:11 vinson2233

I believe this may be caused also by a prediction of a NaN or Inf... I have had some 'letting up' of the issue by using a scaling technique before modeling. However, I do believe this should not be a requirement (to scale) because I do want/need to test on unscaled before progressing to looking at the effects of scaling.

As this is still an issue (apparently in R as well for auto_arima), it would be great to have some ability to try/except within the function itself -- otherwise, when pipelined there is the potential for a breaking failure during cross-val.

Even in a software-engineered pipeline, a try/except block often fails as I have found the program considers itself separate from the try/except block...but perhaps I wasn't excepting ValueError specifically?

AlexanderLavelle avatar Mar 08 '22 13:03 AlexanderLavelle

My case is peculiar. Auto arima was iterated after a group by on different IDs. Each ID had between 25 and 28 dates and the prediction was for a single day. It was working fine until one day it threw the ominous error in subject. After deep research it turned out that the ID causing the failure was made of 25 dates of which 2 were non consecutive. Removing those non consecutive dates fixed the issue. What I still do not understand is why that happened, since the auto arima is run on numerical arrays with no reference to dates...

Algrasso avatar Mar 13 '22 21:03 Algrasso

Seems possibly related to #492 (caused by potential statsmodels bug). We have an open bug with them we're watching

tgsmith61591 avatar Apr 18 '22 13:04 tgsmith61591

Hi, does anyone know if there is any update anywhere about the NaN issue? Thanks!

Algrasso avatar May 11 '22 09:05 Algrasso

Even I am having the same issue. This is the versions I am using.. But interestingly, I am getting this same error whenever I am breaking my timeseries in a test and train frame.. If i am taking the whole series and trying to run it.. There is no such issue. image

JainShubham23 avatar Jun 02 '23 16:06 JainShubham23

One workaround is to multiply the target series by any factor other than 1.

For example:

data = np.array([1872.0, 1452.0, 1476.0, 1404.0, 3048.0, 1788.0, 1080.0, 888.0, 2184.0, 2220.0, 1680.0,
612.0, 2124.0, 486.0, 1968.0, 924.0, 888.0, 1756.0, 1104.0, 876.0, 888.0, 1608.0, 1896.0,
648.0, 1524.0, 804.0, 816.0, 1944.0, 1512.0, 900.0, 1464.0, 876.0, 1464.0, 2136.0, 732.0, 
1764.0, 840.0, 1860.0, 792.0, 1728.0, 768.0, 1080.0, 876.0, 1716.0, 900.0, 1740.0, 888.0, 
2172.0, 486.0])

data *= 0.1
from pmdarima.arima import auto_arima

model_arima = auto_arima(data, start_p = 0, start_q = 0, max_p = 12, max_q = 12, m = 12, start_P = 0, start_Q = 0, seasonal = False, error_action = 'ignore', suppress_warnings = True, stepwise = True)
forecast_arima = model_arima.predict(n_periods = 18, return_conf_int = False, alpha = 0.05)

forecast_arima /= 0.1

duc-ph avatar Aug 16 '23 09:08 duc-ph

One workaround is to multiply the target series by any factor other than 1.

For example:

data = np.array([1872.0, 1452.0, 1476.0, 1404.0, 3048.0, 1788.0, 1080.0, 888.0, 2184.0, 2220.0, 1680.0,
612.0, 2124.0, 486.0, 1968.0, 924.0, 888.0, 1756.0, 1104.0, 876.0, 888.0, 1608.0, 1896.0,
648.0, 1524.0, 804.0, 816.0, 1944.0, 1512.0, 900.0, 1464.0, 876.0, 1464.0, 2136.0, 732.0, 
1764.0, 840.0, 1860.0, 792.0, 1728.0, 768.0, 1080.0, 876.0, 1716.0, 900.0, 1740.0, 888.0, 
2172.0, 486.0])

data *= 0.1
from pmdarima.arima import auto_arima

model_arima = auto_arima(data, start_p = 0, start_q = 0, max_p = 12, max_q = 12, m = 12, start_P = 0, start_Q = 0, seasonal = False, error_action = 'ignore', suppress_warnings = True, stepwise = True)
forecast_arima = model_arima.predict(n_periods = 18, return_conf_int = False, alpha = 0.05)

forecast_arima /= 0.1

This seems to work perfectly :D

arzaan789 avatar Dec 18 '23 20:12 arzaan789