pmdarima icon indicating copy to clipboard operation
pmdarima copied to clipboard

Memory leak?

Open animalmutch opened this issue 4 years ago • 2 comments

Hi. I would like to make a rolling forecaster, by updating my ARIMA model with new data when it becomes available. However, it seems that this will inevitably lead to a memory leak, as more and more data are added to the model, and there's no way, so far as I can see, to dump old data that are no longer relevant. I've considered perhaps creating a new model from some of the endog data of the old model when the nobs_ gets too big, but this would require using undocumented properties. While these aren't protected properties (at least not according to PEP8 standards - they have trailing underscores, which usually denotes avoidance of a conflict with python keywords, although I don't see that there are such conflicts in many cases?) I'm a little concerned about relying on undocumented (and potentially not public) properties. Is there any way to achieve what I'm trying to do using ARIMA's public interfaces? If not, do you plan to support this use case in future? Thanks

animalmutch avatar Apr 30 '21 12:04 animalmutch

Sounds like there are several issues here, and the core desire is to keep the number of observations in the model from growing too large. You're correct that trimming internal data structures is likely to lead to some unpredictable and nasty behavior. A couple questions:

  • Do you have reason to believe that the number of observations in your model is causing a significant impact on memory consumption? (How many observations are there?)
  • Is there a reason you wouldn't want to re-fit the model on the new window? The start_params argument would allow you to set the starting coefficients of your new model to the existing model's, and you could simply fit with only a few steps (maxiter) over the new window.

tgsmith61591 avatar May 17 '21 14:05 tgsmith61591

Thanks @tgsmith61591 . I think that would be The ideal way of doing things. I believe I could just get the start_params for the new model from the original model's params() method. However, I guess I'd want to do this when the data in the model reached a certain length. Is there a public method for finding this?

animalmutch avatar May 19 '21 12:05 animalmutch