Problem

This looks to be very promising module. The theta example given in the tutorial runs without error. But when I tried to implement the theta forecasting using this module for my example data, I got the index error for the validation.

How to fix the IndexError?

I have created a brand new virtual environment (venv) with python 3.9 and installed darts and scaleforecast.

Reproducible Example

import numpy as np
import pandas as pd
from scalecast.Forecaster import Forecaster

col_date = 'BillingDate'
col_val = 'TotWAC'

# data
url = "https://github.com/bhishanpdl/Shared/blob/master/data/data_scalecast/df_train.csv"
dfs = pd.read_html(url)

df_train = dfs[0].iloc[:,1:]
df_train[col_date] = pd.to_datetime(df_train[col_date])

y = df_train[col_val].to_list()
current_dates = df_train[col_date].to_list()

f = Forecaster(y=y,current_dates=current_dates)

f.set_test_length(.2)
f.generate_future_dates(90)
f.set_validation_metric('mape')

from darts.utils.utils import SeasonalityMode, TrendMode, ModelMode

theta_grid = {
    'theta':[0.5,1,1.5,2,2.5,3],
    'model_mode':[
        ModelMode.ADDITIVE,
        ModelMode.MULTIPLICATIVE
    ],
    'season_mode':[
        SeasonalityMode.MULTIPLICATIVE,
        SeasonalityMode.ADDITIVE
    ],
    'trend_mode':[
        TrendMode.EXPONENTIAL,
        TrendMode.LINEAR
    ],
}

f.set_estimator('theta')
f.ingest_grid(theta_grid)
f.cross_validate(k=3)
f.auto_forecast()

Error

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Input In [9], in <cell line: 44>()
     42 f.set_estimator('theta')
     43 f.ingest_grid(theta_grid)
---> 44 f.cross_validate(k=3)
     45 f.auto_forecast()

File \venv\py39darts\lib\site-packages\scalecast\Forecaster.py:3422, in Forecaster.cross_validate(self, k, rolling, dynamic_tuning)
   3420 self.grid = grid_evaluated.iloc[:, :-3]
   3421 self.dynamic_tuning = f2.dynamic_tuning
-> 3422 self._find_best_params(grid_evaluated)
   3423 self.grid_evaluated = grid_evaluated_cv.reset_index(drop=True)
   3424 self.grid = orig_grid

File \venv\py39darts\lib\site-packages\scalecast\Forecaster.py:3434, in Forecaster._find_best_params(self, grid_evaluated)
   3429     best_params_idx = self.grid.loc[
   3430         grid_evaluated["metric_value"]
   3431         == grid_evaluated["metric_value"].max()
   3432     ].index.to_list()[0]
   3433 else:
-> 3434     best_params_idx = self.grid.loc[
   3435         grid_evaluated["metric_value"]
   3436         == grid_evaluated["metric_value"].min()
   3437     ].index.to_list()[0]
   3438 self.best_params = {
   3439     k: v[best_params_idx]
   3440     for k, v in self.grid.to_dict(orient="series").items()
   3441 }
   3442 self.best_params = {
   3443     k: (
   3444         v
   (...)
   3452     for k, v in self.best_params.items()
   3453 }

IndexError: list index out of range

System Info

$ pip freeze
absl-py==1.2.0
aiohttp==3.8.1
aiosignal==1.2.0
argon2-cffi==21.3.0
argon2-cffi-bindings==21.2.0
asttokens==2.0.8
astunparse==1.6.3
async-timeout==4.0.2
attrs==22.1.0
autopep8==1.7.0
backcall==0.2.0
beautifulsoup4==4.11.1
bleach==5.0.1
cachetools==5.2.0
catboost==1.0.6
certifi==2022.6.15
cffi==1.15.1
charset-normalizer==2.1.1
cmdstanpy==1.0.5
colorama==0.4.5
convertdate==2.4.0
cycler==0.11.0
Cython==0.29.32
darts==0.21.0
debugpy==1.6.3
decorator==5.1.1
defusedxml==0.7.1
eli5==0.13.0
entrypoints==0.4
ephem==4.1.3
et-xmlfile==1.1.0
executing==0.10.0
fastjsonschema==2.16.1
flatbuffers==1.12
fonttools==4.36.0
frozenlist==1.3.1
fsspec==2022.7.1
gast==0.4.0
google-auth==2.11.0
google-auth-oauthlib==0.4.6
google-pasta==0.2.0
graphviz==0.20.1
greenlet==1.1.2
grpcio==1.47.0
h5py==3.7.0
hijri-converter==2.2.4
holidays==0.15
html5lib==1.1
idna==3.3
importlib-metadata==4.12.0
ipykernel==6.15.1
ipython==8.4.0
ipython-genutils==0.2.0
ipywidgets==8.0.1
jedi==0.18.1
Jinja2==3.1.2
joblib==1.1.0
jsonschema==4.14.0
jupyter==1.0.0
jupyter-client==7.3.4
jupyter-console==6.4.4
jupyter-contrib-core==0.4.0
jupyter-contrib-nbextensions==0.5.1
jupyter-core==4.11.1
jupyter-highlight-selected-word==0.2.0
jupyter-latex-envs==1.4.6
jupyter-nbextensions-configurator==0.5.0
jupyterlab-pygments==0.2.2
jupyterlab-widgets==3.0.2
keras==2.9.0
Keras-Preprocessing==1.1.2
kiwisolver==1.4.4
korean-lunar-calendar==0.2.1
libclang==14.0.6
lightgbm==3.3.2
llvmlite==0.39.0
LunarCalendar==0.0.9
lxml==4.9.1
Markdown==3.4.1
MarkupSafe==2.1.1
matplotlib==3.5.3
matplotlib-inline==0.1.6
mistune==2.0.4
multidict==6.0.2
nbclient==0.6.6
nbconvert==7.0.0
nbformat==5.4.0
nest-asyncio==1.5.5
nfoursid==1.0.1
notebook==6.4.12
numba==0.56.0
numpy==1.22.4
oauthlib==3.2.0
openpyxl==3.0.10
opt-einsum==3.3.0
packaging==21.3
pandas==1.4.3
pandas-datareader==0.10.0
pandocfilters==1.5.0
parso==0.8.3
patsy==0.5.2
pickleshare==0.7.5
Pillow==9.2.0
plotly==5.10.0
pmdarima==2.0.0
prometheus-client==0.14.1
prompt-toolkit==3.0.30
prophet==1.1
protobuf==3.19.4
psutil==5.9.1
pure-eval==0.2.2
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycodestyle==2.9.1
pycparser==2.21
pyDeprecate==0.3.2
Pygments==2.13.0
PyMeeus==0.5.11
pyodbc==4.0.34
pyparsing==3.0.9
pyrsistent==0.18.1
python-dateutil==2.8.2
pytorch-lightning==1.7.2
pytz==2022.2.1
pywin32==304
pywinpty==2.0.7
PyYAML==6.0
pyzmq==23.2.1
qtconsole==5.3.1
QtPy==2.2.0
requests==2.28.1
requests-oauthlib==1.3.1
rsa==4.9
SCALECAST==0.13.11
scikit-learn==1.1.2
scipy==1.9.0
seaborn==0.11.2
Send2Trash==1.8.0
setuptools-git==1.2
six==1.16.0
soupsieve==2.3.2.post1
SQLAlchemy==1.4.40
stack-data==0.4.0
statsforecast==0.6.0
statsmodels==0.13.2
tabulate==0.8.10
tbats==1.1.0
tenacity==8.0.1
tensorboard==2.9.1
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.1
tensorflow==2.9.1
tensorflow-estimator==2.9.0
tensorflow-io-gcs-filesystem==0.26.0
termcolor==1.1.0
terminado==0.15.0
threadpoolctl==3.1.0
tinycss2==1.1.1
toml==0.10.2
torch==1.12.1
torchmetrics==0.9.3
tornado==6.2
tqdm==4.64.0
traitlets==5.3.0
typing_extensions==4.3.0
ujson==5.4.0
urllib3==1.26.12
watermark==2.3.1
wcwidth==0.2.5
webencodings==0.5.1
Werkzeug==2.2.2
widgetsnbextension==4.0.2
wrapt==1.14.1
xarray==2022.6.0
xgboost==1.6.2
yarl==1.8.1
zipp==3.8.1

Aug 25 '22 00:08 bhishanpdl

This particular error is caused by using mape as the validation metric when there are 0s in the validation set, in other words, this line:

f.set_validation_metric('mape')

In scalecast, mape doesn't evaluate when there are 0s in the actuals (unlike sklearn, which returns a very large number when this happens). The default validation metric is rmse for this very reason. However, ideally I would like to see a more descriptive error returned in these instances so that users aren't banging their heads against the wall when it happens. The problem is it is hard to detect whether or not there will be 0s in the validation set at the time the validation metric is set. I think what I can do is explicitly check for NAs when the grid is being evaluated so that at least the error message is more informative, but it won't change when the error is raised. I'll plan on having that ready for the next distribution update, unless you think have a different preference on how to handle this situation.

Tangentially, the theta model isn't able to evaluate with multiplicative seasonality and exponential trends when there are non-positive numbers in the training set. It can be annoying to have to remove those options from your grid, as they can still be the optimal parameters. A way around this could be to use the SeriesTransformer object:

import numpy as np
import pandas as pd
from scalecast.Forecaster import Forecaster
from scalecast.SeriesTransformer import SeriesTransformer

def trans_func(x,ymin):
    return [xi - ymin + 1 for xi in x]

def revert_func(x,ymin):
    return [xi + ymin - 1 for xi in x]

col_date = 'BillingDate'
col_val = 'TotWAC'

# data
url = "https://github.com/bhishanpdl/Shared/blob/master/data/data_scalecast/df_train.csv"
dfs = pd.read_html(url)

df_train = dfs[0].iloc[:,1:]
df_train[col_date] = pd.to_datetime(df_train[col_date])

y = df_train[col_val].to_list()
current_dates = df_train[col_date].to_list()

f = Forecaster(y=y,current_dates=current_dates)
ymin = f.y.min()

transformer = SeriesTransformer(f)
f = transformer.Transform(trans_func,ymin=ymin)

f.set_test_length(.2)
f.generate_future_dates(90)
f.set_validation_metric('mape')
from darts.utils.utils import SeasonalityMode, TrendMode, ModelMode

theta_grid = {
    'theta':[0.5,1,1.5,2,2.5,3],
    'model_mode':[
        ModelMode.ADDITIVE,
        ModelMode.MULTIPLICATIVE
    ],
    'season_mode':[
        SeasonalityMode.MULTIPLICATIVE,
        SeasonalityMode.ADDITIVE,
    ],
    'trend_mode':[
        TrendMode.EXPONENTIAL,
        TrendMode.LINEAR
    ],
}

f.set_estimator('theta')
f.ingest_grid(theta_grid)
f.cross_validate(k=3)
f.auto_forecast()

# revert back to original values
f = transformer.Revert(revert_func,full=True,ymin=ymin)

Let me know if this answers the concern.

Aug 25 '22 01:08 mikekeith52

Thanks a lot for the introduction of transformers. Actually, I was using pycart with prophet,xgboost,lightgbm,catboost and was getting absolute percent difference of total sum for last 90 days 21%, but using this theta regressor with transformer I got 19%.

I am really impressed with this module and the code is running fine now.

I have a follow up question though,

Follow up Question: How to suppress the warning ?

WARNING:darts.models.forecasting.theta:Time series has negative values. Fallback to additive and linear model

I get a lot of lines with these errors.

Thanks a lot for quick response and kudos for hard work of developing such a great module.

Aug 26 '22 23:08 bhishanpdl

Nevermind, I looked at the WARNING in darts github. They have another issue raised there. For now, I can suppress the warnings (and anything that a cell prints) using cell magic %%capture in jupyter notebook. We can close the isssue.

Aug 26 '22 23:08 bhishanpdl

Great! I'm glad you are enjoying the package. Thank you for contributing to help improve it.

The informative error message will be part of the 0.14.0 distribution but per your request, I will close the issue.

Aug 27 '22 00:08 mikekeith52

Possible Bug: f.forecast gives Index error

Problem

Reproducible Example

Error

System Info

Follow up Question: How to suppress the warning ?