darts icon indicating copy to clipboard operation
darts copied to clipboard

[BUG] Issue with historical_forcasts method

Open smhoma opened this issue 2 years ago • 3 comments

I had this problem with the historical_forcasts method in RNNModel and NBEATSModel. I get this " ValueError: conflicting sizes for dimension 'time': length 47 on the data but length 461 on coordinate 'time' ". i'm not sure but guess that this error happens at the last stride of the method. and by changinging the forcast_horizon and stride values, "47" and "461" values in the error will change. here is the Traceback:

ValueError                                Traceback (most recent call last)
Input In [29], in <cell line: 1>()
----> 1 hist_forcast = model.historical_forecasts(series=data_trans,
      2                                           num_samples=20,
      3                                           start=0.85,
      4                                           forecast_horizon=5,
      5                                           stride=10,
      6                                           )

File ~\anaconda3\envs\torch39\lib\site-packages\darts\utils\utils.py:172, in _with_sanity_checks.<locals>.decorator.<locals>.sanitized_method(self, *args, **kwargs)
    169     only_args.pop("self")
    171     getattr(self, sanity_check_method)(*only_args.values(), **only_kwargs)
--> 172 return method_to_sanitize(self, *only_args.values(), **only_kwargs)

File ~\anaconda3\envs\torch39\lib\site-packages\darts\models\forecasting\forecasting_model.py:469, in ForecastingModel.historical_forecasts(self, series, past_covariates, future_covariates, num_samples, train_length, start, forecast_horizon, stride, retrain, overlap_end, last_points_only, verbose)
    461         return TimeSeries.from_times_and_values(
    462             pd.DatetimeIndex(last_points_times, freq=series.freq * stride),
    463             np.array(last_points_values),
   (...)
    466             hierarchy=series.hierarchy,
    467         )
    468     else:
--> 469         return TimeSeries.from_times_and_values(
    470             pd.RangeIndex(
    471                 start=last_points_times[0],
    472                 stop=last_points_times[-1] + 1,
    473                 step=1,
    474             ),
    475             np.array(last_points_values),
    476             columns=series.columns,
    477             static_covariates=series.static_covariates,
    478             hierarchy=series.hierarchy,
    479         )
    481 return forecasts

File ~\anaconda3\envs\torch39\lib\site-packages\darts\timeseries.py:934, in TimeSeries.from_times_and_values(cls, times, values, fill_missing_dates, freq, columns, fillna_value, static_covariates, hierarchy)
    931 if columns is not None:
    932     coords[DIMS[1]] = columns
--> 934 xa = xr.DataArray(
    935     values,
    936     dims=(times_name,) + DIMS[-2:],
    937     coords=coords,
    938     attrs={STATIC_COV_TAG: static_covariates, HIERARCHY_TAG: hierarchy},
    939 )
    941 return cls.from_xarray(
    942     xa=xa,
    943     fill_missing_dates=fill_missing_dates,
    944     freq=freq,
    945     fillna_value=fillna_value,
    946 )

File ~\anaconda3\envs\torch39\lib\site-packages\xarray\core\dataarray.py:412, in DataArray.__init__(self, data, coords, dims, name, attrs, indexes, fastpath)
    410 data = _check_data_shape(data, coords, dims)
    411 data = as_compatible_data(data)
--> 412 coords, dims = _infer_coords_and_dims(data.shape, coords, dims)
    413 variable = Variable(dims, data, attrs, fastpath=True)
    414 indexes, coords = _create_indexes_from_coords(coords)

File ~\anaconda3\envs\torch39\lib\site-packages\xarray\core\dataarray.py:160, in _infer_coords_and_dims(shape, coords, dims)
    158 for d, s in zip(v.dims, v.shape):
    159     if s != sizes[d]:
--> 160         raise ValueError(
    161             f"conflicting sizes for dimension {d!r}: "
    162             f"length {sizes[d]} on the data but length {s} on "
    163             f"coordinate {k!r}"
    164         )
    166 if k in sizes and v.shape != (sizes[k],):
    167     raise ValueError(
    168         f"coordinate {k!r} is a DataArray dimension, but "
    169         f"it has shape {v.shape!r} rather than expected shape {sizes[k]!r} "
    170         "matching the dimension size"
    171     )

ValueError: conflicting sizes for dimension 'time': length 47 on the data but length 461 on coordinate 'time'

To Reproduce here is the code I wrote in jupyter:

import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import yfinance as yf

from darts.timeseries import TimeSeries
from darts.models import NBEATSModel
from darts.utils.likelihood_models import GaussianLikelihood
from darts.dataprocessing.transformers import Scaler
from darts.metrics import mae, mape

from pytorch_lightning.callbacks import EarlyStopping, ModelCheckpoint


df = yf.download("GOOG", start="2010-01-01", end="2022-06-30", period="1d", progress=False)
df = df.reset_index()
data = TimeSeries.from_dataframe(df, value_cols=["Close"], fill_missing_dates=False) 
data = data.astype(np.float32)
scalar_ts = Scaler()
data_trans = scalar_ts.fit_transform(data)
train, val = data_trans.split_after(0.85)

model_checkpoint_callback = ModelCheckpoint(filename="best_darts_gru_model_cb",
                                            save_top_k=1,
                                            monitor='train_loss',
                                            mode='min')
early_stopping_callback = EarlyStopping(monitor='train_loss',
                                        patience=10,
                                        mode='min',
                                        check_finite=False)

model = RNNModel(input_chunk_length=7,
                 model="GRU",
                 hidden_dim=50,
                 n_rnn_layers=2,
                 n_epochs=500,
                 training_length=7,
                 likelihood=GaussianLikelihood(),
                 model_name="GRU_Darts",
                 work_dir="optimization",
                 save_checkpoints=True,
                 pl_trainer_kwargs={"callbacks":[model_checkpoint_callback, early_stopping_callback],
                                    "accelerator": "gpu",
                                    "gpus": [0]
                                    },
                 force_reset=True)

model.fit(train, val_series=val)

hist_forcast = model.historical_forecasts(series=data_trans,
                                          num_samples=20,
                                          start=0.85,
                                          forecast_horizon=5,
                                          stride=10,
                                          )

System (please complete the following information):

  • Python version: 3.9.13
  • darts version: 0.20.0

smhoma avatar Aug 12 '22 12:08 smhoma

Hi, I used "stride=1" this time and there was no error. the method ran completely and no error occurred. it seems that the error is for stride>1.

smhoma avatar Aug 13 '22 14:08 smhoma

Hi all, I'm running into the same error :)

DeastinY avatar Aug 30 '22 13:08 DeastinY

Thanks for reporting, we'll have to have a look

hrzn avatar Aug 31 '22 13:08 hrzn

Hi,

I tried to replicate your issue with the latest darts version (0.22.0) and could not reproduce it. Could you please try to update the library and see if the problem still occurs? Thank you.

madtoinou avatar Nov 02 '22 09:11 madtoinou

Closing - feel free to re-open if the issue is spotted again.

hrzn avatar Jan 05 '23 15:01 hrzn