pytorch-forecasting
pytorch-forecasting copied to clipboard
Memory leak in TimeSeriesDataSet
- PyTorch-Forecasting version: 0.8.5
- PyTorch version: 1.9.0
- PyTorch Lightning version: 1.4.0
- Python version: 3.8
- Operating System: MacOS 11.4
Expected behavior
When a TimeSeriesDataSet
instance is no longer being used, I'd expect the memory it uses to be released.
Actual behavior
Instead, memory seems to accumulate when creating multiple instances of a TimeSeriesDataSet
, which is what happens under the hood when calling e.g. the predict
method on the TemporalFusionTransformer
class with a pandas DataFrame
. This causes my deployment that serves predictions using a TemporalFusionTransformer
to get OOMKilled
after some time.
Code to reproduce the problem
A minimal example can be found below. Simply run this script locally, and monitor your machine's memory usage. Memory accumulates over time. Uncommenting the last lines, where some of the attributes are explicitly set to None
, seems to alleviate the problem a bit, but not completely solve it.
import numpy as np
import pandas as pd
from pytorch_forecasting import TimeSeriesDataSet
test_data = pd.DataFrame(
{
"value": np.random.rand(3000000) - 0.5,
"group": np.repeat(np.arange(3), 1000000),
"time_idx": np.tile(np.arange(1000000), 3),
}
)
# Memory accumulates when creating `TimeSeriesDataSet`s. Seems like not everything is
# being garbage collected after a `TimeSeriesDataSet` instance is no longer used.
for i in range(100):
print("Creating dataset ", i)
dataset = TimeSeriesDataSet(
test_data,
group_ids=["group"],
target="value",
time_idx="time_idx",
min_encoder_length=5,
max_encoder_length=5,
min_prediction_length=2,
max_prediction_length=2,
time_varying_unknown_reals=["value"],
predict_mode=False
)
# Uncommenting the following lines help to reduce the memory leak, but does not
# completely solve it. Some memory is still not released.
# dataset.index = None
# dataset.data = None
I met the same problem in Debian 9
, PyTorch-Forecasting version: 0.9.1
(the latest version) for now.
We can see the memory increasing more clearly via psutil
package (just add some lines compared with the above codes):
...
process = psutil.Process(os.getpid())
for i in range(100):
print("Creating dataset ", i)
dataset = TimeSeriesDataSet(
...
)
proc_mem = process.memory_info().rss / (1024 ** 2)
print("{} {} MB".format(i, proc_mem))
So, I have the same problem.
While digging into the code I found that it's probably an expanding of functools LRU (Least Recently Used) cache which is used in several TimeSeriesDataset methods...
I've found a way to clear it on the stackoverflow, but it seems like workaround rather than a clean solution. However it works;)
...
for i in range(100):
gc.collect()
wrappers = [x for x in gc.get_objects()
if isinstance(x, functools._lru_cache_wrapper)]
for wrapper in wrappers:
wrapper.cache_clear()
print("Creating dataset ", i)
dataset = TimeSeriesDataSet(
...
)
The problem looks like the lru_cache(None)
https://github.com/jdb78/pytorch-forecasting/blob/master/pytorch_forecasting/data/timeseries.py
If user_function is specified, it must be a callable. This allows the lru_cache decorator to be applied directly to a user function, leaving the maxsize at its default value of 128:
@lru_cache
def count_vowels(sentence):
return sum(sentence.count(vowel) for vowel in 'AEIOUaeiou')
If maxsize is set to None, the LRU feature is disabled and **the cache can grow without bound**
The problem looks like the lru_cache(None)问题看起来像 lru_cache(None)
https://github.com/jdb78/pytorch-forecasting/blob/master/pytorch_forecasting/data/timeseries.py
If user_function is specified, it must be a callable. This allows the lru_cache decorator to be applied directly to a user function, leaving the maxsize at its default value of 128: @lru_cache def count_vowels(sentence): return sum(sentence.count(vowel) for vowel in 'AEIOUaeiou') If maxsize is set to None, the LRU feature is disabled and **the cache can grow without bound**
It works! Thanks!