M4metalearning icon indicating copy to clipboard operation
M4metalearning copied to clipboard

Memory limit for massive amount of timeseries

Open MalteFlender opened this issue 6 years ago • 2 comments

It seems to me that at the moment the RAM of the computer I'm using is the limiting factor regarding the amount of timeseries to be trained with. If I want train the system with e.g. 16 GB of timeseries-data I need to have at least 16 GB of RAM.

Is there a way to get around this issue? Maybe it is possible to train the system in smaller Batches or use some kind of iterator. I'm trying to train the system with a lot of data obtained from a database, where I get the timeseries in small chunks.

MalteFlender avatar Dec 26 '18 09:12 MalteFlender

Thank you, Do you mean that you get some kind of error when running on your data? Or is it just extremelly slow? There is a known problem when running the code in parallel with large amounts of data, specifically when calculating the forecasts. You can try smaller batches for that part, then the part that relies on xgboost should be able to handle relatively larger datasets.

The parallelization problem of the forecasting part will be fixed soon

pmontman avatar Feb 08 '19 05:02 pmontman

Currently I'm not using the system (I'm planing to). Since the training is done in one single step it seems to me like there is no way I can split the training into different parts and therefore process training-sets that are bigger than my current RAM-size, since that's the place where I have to store the data.

MalteFlender avatar Feb 19 '19 15:02 MalteFlender