pytorch-forecasting icon indicating copy to clipboard operation
pytorch-forecasting copied to clipboard

Memory leak in TimeSeriesDataSet

Open TomSteenbergen opened this issue 3 years ago • 4 comments

  • PyTorch-Forecasting version: 0.8.5
  • PyTorch version: 1.9.0
  • PyTorch Lightning version: 1.4.0
  • Python version: 3.8
  • Operating System: MacOS 11.4

Expected behavior

When a TimeSeriesDataSet instance is no longer being used, I'd expect the memory it uses to be released.

Actual behavior

Instead, memory seems to accumulate when creating multiple instances of a TimeSeriesDataSet, which is what happens under the hood when calling e.g. the predict method on the TemporalFusionTransformer class with a pandas DataFrame. This causes my deployment that serves predictions using a TemporalFusionTransformer to get OOMKilled after some time.

Code to reproduce the problem

A minimal example can be found below. Simply run this script locally, and monitor your machine's memory usage. Memory accumulates over time. Uncommenting the last lines, where some of the attributes are explicitly set to None, seems to alleviate the problem a bit, but not completely solve it.

import numpy as np
import pandas as pd
from pytorch_forecasting import TimeSeriesDataSet

test_data = pd.DataFrame(
    {
        "value": np.random.rand(3000000) - 0.5,
        "group": np.repeat(np.arange(3), 1000000),
        "time_idx": np.tile(np.arange(1000000), 3),
    }
)

# Memory accumulates when creating `TimeSeriesDataSet`s. Seems like not everything is
# being garbage collected after a `TimeSeriesDataSet` instance is no longer used.
for i in range(100):
    print("Creating dataset ", i)
    dataset = TimeSeriesDataSet(
        test_data,
        group_ids=["group"],
        target="value",
        time_idx="time_idx",
        min_encoder_length=5,
        max_encoder_length=5,
        min_prediction_length=2,
        max_prediction_length=2,
        time_varying_unknown_reals=["value"],
        predict_mode=False
    )
    
    # Uncommenting the following lines help to reduce the memory leak, but does not
    # completely solve it. Some memory is still not released.
    # dataset.index = None
    # dataset.data = None

TomSteenbergen avatar Aug 19 '21 07:08 TomSteenbergen

I met the same problem in Debian 9, PyTorch-Forecasting version: 0.9.1(the latest version) for now. We can see the memory increasing more clearly via psutil package (just add some lines compared with the above codes):

...
process = psutil.Process(os.getpid())
for i in range(100):
    print("Creating dataset ", i)
    dataset = TimeSeriesDataSet(
        ...
    )
    proc_mem = process.memory_info().rss / (1024 ** 2)
    print("{} {} MB".format(i, proc_mem))

kr11 avatar Nov 08 '21 03:11 kr11

So, I have the same problem.

While digging into the code I found that it's probably an expanding of functools LRU (Least Recently Used) cache which is used in several TimeSeriesDataset methods...

I've found a way to clear it on the stackoverflow, but it seems like workaround rather than a clean solution. However it works;)

...
for i in range(100):
    gc.collect()
    wrappers = [x for x in gc.get_objects() 
                if isinstance(x, functools._lru_cache_wrapper)]
    for wrapper in wrappers:
      wrapper.cache_clear()
    print("Creating dataset ", i)
    dataset = TimeSeriesDataSet(
        ...
    )

romanzes637 avatar Nov 20 '22 10:11 romanzes637

The problem looks like the lru_cache(None)

https://github.com/jdb78/pytorch-forecasting/blob/master/pytorch_forecasting/data/timeseries.py

If user_function is specified, it must be a callable. This allows the lru_cache decorator to be applied directly to a user function, leaving the maxsize at its default value of 128:

@lru_cache
def count_vowels(sentence):
    return sum(sentence.count(vowel) for vowel in 'AEIOUaeiou')
If maxsize is set to None, the LRU feature is disabled and **the cache can grow without bound**

shaoxt avatar Mar 15 '24 07:03 shaoxt

The problem looks like the lru_cache(None)问题看起来像 lru_cache(None)

https://github.com/jdb78/pytorch-forecasting/blob/master/pytorch_forecasting/data/timeseries.py

If user_function is specified, it must be a callable. This allows the lru_cache decorator to be applied directly to a user function, leaving the maxsize at its default value of 128:

@lru_cache
def count_vowels(sentence):
    return sum(sentence.count(vowel) for vowel in 'AEIOUaeiou')
If maxsize is set to None, the LRU feature is disabled and **the cache can grow without bound**

It works! Thanks!

IlIlllIIllIIlll avatar Aug 15 '24 01:08 IlIlllIIllIIlll