scalecast
scalecast copied to clipboard
Possible Bug: f.forecast gives Index error
Problem
This looks to be very promising module. The theta example given in the tutorial runs without error. But when I tried to implement the theta forecasting using this module for my example data, I got the index error for the validation.
How to fix the IndexError?
I have created a brand new virtual environment (venv) with python 3.9 and installed darts and scaleforecast.
Reproducible Example
import numpy as np
import pandas as pd
from scalecast.Forecaster import Forecaster
col_date = 'BillingDate'
col_val = 'TotWAC'
# data
url = "https://github.com/bhishanpdl/Shared/blob/master/data/data_scalecast/df_train.csv"
dfs = pd.read_html(url)
df_train = dfs[0].iloc[:,1:]
df_train[col_date] = pd.to_datetime(df_train[col_date])
y = df_train[col_val].to_list()
current_dates = df_train[col_date].to_list()
f = Forecaster(y=y,current_dates=current_dates)
f.set_test_length(.2)
f.generate_future_dates(90)
f.set_validation_metric('mape')
from darts.utils.utils import SeasonalityMode, TrendMode, ModelMode
theta_grid = {
'theta':[0.5,1,1.5,2,2.5,3],
'model_mode':[
ModelMode.ADDITIVE,
ModelMode.MULTIPLICATIVE
],
'season_mode':[
SeasonalityMode.MULTIPLICATIVE,
SeasonalityMode.ADDITIVE
],
'trend_mode':[
TrendMode.EXPONENTIAL,
TrendMode.LINEAR
],
}
f.set_estimator('theta')
f.ingest_grid(theta_grid)
f.cross_validate(k=3)
f.auto_forecast()
Error
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
Input In [9], in <cell line: 44>()
42 f.set_estimator('theta')
43 f.ingest_grid(theta_grid)
---> 44 f.cross_validate(k=3)
45 f.auto_forecast()
File \venv\py39darts\lib\site-packages\scalecast\Forecaster.py:3422, in Forecaster.cross_validate(self, k, rolling, dynamic_tuning)
3420 self.grid = grid_evaluated.iloc[:, :-3]
3421 self.dynamic_tuning = f2.dynamic_tuning
-> 3422 self._find_best_params(grid_evaluated)
3423 self.grid_evaluated = grid_evaluated_cv.reset_index(drop=True)
3424 self.grid = orig_grid
File \venv\py39darts\lib\site-packages\scalecast\Forecaster.py:3434, in Forecaster._find_best_params(self, grid_evaluated)
3429 best_params_idx = self.grid.loc[
3430 grid_evaluated["metric_value"]
3431 == grid_evaluated["metric_value"].max()
3432 ].index.to_list()[0]
3433 else:
-> 3434 best_params_idx = self.grid.loc[
3435 grid_evaluated["metric_value"]
3436 == grid_evaluated["metric_value"].min()
3437 ].index.to_list()[0]
3438 self.best_params = {
3439 k: v[best_params_idx]
3440 for k, v in self.grid.to_dict(orient="series").items()
3441 }
3442 self.best_params = {
3443 k: (
3444 v
(...)
3452 for k, v in self.best_params.items()
3453 }
IndexError: list index out of range
System Info
$ pip freeze
absl-py==1.2.0
aiohttp==3.8.1
aiosignal==1.2.0
argon2-cffi==21.3.0
argon2-cffi-bindings==21.2.0
asttokens==2.0.8
astunparse==1.6.3
async-timeout==4.0.2
attrs==22.1.0
autopep8==1.7.0
backcall==0.2.0
beautifulsoup4==4.11.1
bleach==5.0.1
cachetools==5.2.0
catboost==1.0.6
certifi==2022.6.15
cffi==1.15.1
charset-normalizer==2.1.1
cmdstanpy==1.0.5
colorama==0.4.5
convertdate==2.4.0
cycler==0.11.0
Cython==0.29.32
darts==0.21.0
debugpy==1.6.3
decorator==5.1.1
defusedxml==0.7.1
eli5==0.13.0
entrypoints==0.4
ephem==4.1.3
et-xmlfile==1.1.0
executing==0.10.0
fastjsonschema==2.16.1
flatbuffers==1.12
fonttools==4.36.0
frozenlist==1.3.1
fsspec==2022.7.1
gast==0.4.0
google-auth==2.11.0
google-auth-oauthlib==0.4.6
google-pasta==0.2.0
graphviz==0.20.1
greenlet==1.1.2
grpcio==1.47.0
h5py==3.7.0
hijri-converter==2.2.4
holidays==0.15
html5lib==1.1
idna==3.3
importlib-metadata==4.12.0
ipykernel==6.15.1
ipython==8.4.0
ipython-genutils==0.2.0
ipywidgets==8.0.1
jedi==0.18.1
Jinja2==3.1.2
joblib==1.1.0
jsonschema==4.14.0
jupyter==1.0.0
jupyter-client==7.3.4
jupyter-console==6.4.4
jupyter-contrib-core==0.4.0
jupyter-contrib-nbextensions==0.5.1
jupyter-core==4.11.1
jupyter-highlight-selected-word==0.2.0
jupyter-latex-envs==1.4.6
jupyter-nbextensions-configurator==0.5.0
jupyterlab-pygments==0.2.2
jupyterlab-widgets==3.0.2
keras==2.9.0
Keras-Preprocessing==1.1.2
kiwisolver==1.4.4
korean-lunar-calendar==0.2.1
libclang==14.0.6
lightgbm==3.3.2
llvmlite==0.39.0
LunarCalendar==0.0.9
lxml==4.9.1
Markdown==3.4.1
MarkupSafe==2.1.1
matplotlib==3.5.3
matplotlib-inline==0.1.6
mistune==2.0.4
multidict==6.0.2
nbclient==0.6.6
nbconvert==7.0.0
nbformat==5.4.0
nest-asyncio==1.5.5
nfoursid==1.0.1
notebook==6.4.12
numba==0.56.0
numpy==1.22.4
oauthlib==3.2.0
openpyxl==3.0.10
opt-einsum==3.3.0
packaging==21.3
pandas==1.4.3
pandas-datareader==0.10.0
pandocfilters==1.5.0
parso==0.8.3
patsy==0.5.2
pickleshare==0.7.5
Pillow==9.2.0
plotly==5.10.0
pmdarima==2.0.0
prometheus-client==0.14.1
prompt-toolkit==3.0.30
prophet==1.1
protobuf==3.19.4
psutil==5.9.1
pure-eval==0.2.2
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycodestyle==2.9.1
pycparser==2.21
pyDeprecate==0.3.2
Pygments==2.13.0
PyMeeus==0.5.11
pyodbc==4.0.34
pyparsing==3.0.9
pyrsistent==0.18.1
python-dateutil==2.8.2
pytorch-lightning==1.7.2
pytz==2022.2.1
pywin32==304
pywinpty==2.0.7
PyYAML==6.0
pyzmq==23.2.1
qtconsole==5.3.1
QtPy==2.2.0
requests==2.28.1
requests-oauthlib==1.3.1
rsa==4.9
SCALECAST==0.13.11
scikit-learn==1.1.2
scipy==1.9.0
seaborn==0.11.2
Send2Trash==1.8.0
setuptools-git==1.2
six==1.16.0
soupsieve==2.3.2.post1
SQLAlchemy==1.4.40
stack-data==0.4.0
statsforecast==0.6.0
statsmodels==0.13.2
tabulate==0.8.10
tbats==1.1.0
tenacity==8.0.1
tensorboard==2.9.1
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.1
tensorflow==2.9.1
tensorflow-estimator==2.9.0
tensorflow-io-gcs-filesystem==0.26.0
termcolor==1.1.0
terminado==0.15.0
threadpoolctl==3.1.0
tinycss2==1.1.1
toml==0.10.2
torch==1.12.1
torchmetrics==0.9.3
tornado==6.2
tqdm==4.64.0
traitlets==5.3.0
typing_extensions==4.3.0
ujson==5.4.0
urllib3==1.26.12
watermark==2.3.1
wcwidth==0.2.5
webencodings==0.5.1
Werkzeug==2.2.2
widgetsnbextension==4.0.2
wrapt==1.14.1
xarray==2022.6.0
xgboost==1.6.2
yarl==1.8.1
zipp==3.8.1
This particular error is caused by using mape as the validation metric when there are 0s in the validation set, in other words, this line:
f.set_validation_metric('mape')
In scalecast, mape doesn't evaluate when there are 0s in the actuals (unlike sklearn, which returns a very large number when this happens). The default validation metric is rmse for this very reason. However, ideally I would like to see a more descriptive error returned in these instances so that users aren't banging their heads against the wall when it happens. The problem is it is hard to detect whether or not there will be 0s in the validation set at the time the validation metric is set. I think what I can do is explicitly check for NAs when the grid is being evaluated so that at least the error message is more informative, but it won't change when the error is raised. I'll plan on having that ready for the next distribution update, unless you think have a different preference on how to handle this situation.
Tangentially, the theta model isn't able to evaluate with multiplicative seasonality and exponential trends when there are non-positive numbers in the training set. It can be annoying to have to remove those options from your grid, as they can still be the optimal parameters. A way around this could be to use the SeriesTransformer
object:
import numpy as np
import pandas as pd
from scalecast.Forecaster import Forecaster
from scalecast.SeriesTransformer import SeriesTransformer
def trans_func(x,ymin):
return [xi - ymin + 1 for xi in x]
def revert_func(x,ymin):
return [xi + ymin - 1 for xi in x]
col_date = 'BillingDate'
col_val = 'TotWAC'
# data
url = "https://github.com/bhishanpdl/Shared/blob/master/data/data_scalecast/df_train.csv"
dfs = pd.read_html(url)
df_train = dfs[0].iloc[:,1:]
df_train[col_date] = pd.to_datetime(df_train[col_date])
y = df_train[col_val].to_list()
current_dates = df_train[col_date].to_list()
f = Forecaster(y=y,current_dates=current_dates)
ymin = f.y.min()
transformer = SeriesTransformer(f)
f = transformer.Transform(trans_func,ymin=ymin)
f.set_test_length(.2)
f.generate_future_dates(90)
f.set_validation_metric('mape')
from darts.utils.utils import SeasonalityMode, TrendMode, ModelMode
theta_grid = {
'theta':[0.5,1,1.5,2,2.5,3],
'model_mode':[
ModelMode.ADDITIVE,
ModelMode.MULTIPLICATIVE
],
'season_mode':[
SeasonalityMode.MULTIPLICATIVE,
SeasonalityMode.ADDITIVE,
],
'trend_mode':[
TrendMode.EXPONENTIAL,
TrendMode.LINEAR
],
}
f.set_estimator('theta')
f.ingest_grid(theta_grid)
f.cross_validate(k=3)
f.auto_forecast()
# revert back to original values
f = transformer.Revert(revert_func,full=True,ymin=ymin)
Let me know if this answers the concern.
Thanks a lot for the introduction of transformers. Actually, I was using pycart with prophet,xgboost,lightgbm,catboost and was getting absolute percent difference of total sum for last 90 days 21%, but using this theta regressor with transformer I got 19%.
I am really impressed with this module and the code is running fine now.
I have a follow up question though,
Follow up Question: How to suppress the warning ?
WARNING:darts.models.forecasting.theta:Time series has negative values. Fallback to additive and linear model
I get a lot of lines with these errors.
Thanks a lot for quick response and kudos for hard work of developing such a great module.
Nevermind, I looked at the WARNING in darts github. They have another issue raised there. For now, I can suppress the warnings (and anything that a cell prints) using cell magic %%capture
in jupyter notebook. We can close the isssue.
Great! I'm glad you are enjoying the package. Thank you for contributing to help improve it.
The informative error message will be part of the 0.14.0 distribution but per your request, I will close the issue.