neuralforecast icon indicating copy to clipboard operation
neuralforecast copied to clipboard

Issue-409 Add support for datasets that can't fit in memory

Open jasminerienecker opened this issue 8 months ago • 4 comments

As described in this issue: https://github.com/Nixtla/neuralforecast/issues/409

We assume the dataset is split across multiple parquet files - each parquet file corresponds to a single timeseries which is represented as a pandas dataframe. This PR creates a new Dataset class where the getitem method reads the parquet file corresponding to that index, and the from_data_directory() method replicates the from_df() method.

I have added a test to end of core.ipynb that checks the forecasts using this distributed dataset are the same as when the dataset is directly passed in as a pandas dataframe.

jasminerienecker avatar Jul 01 '24 03:07 jasminerienecker