pytorch-forecasting icon indicating copy to clipboard operation
pytorch-forecasting copied to clipboard

GPU Runs Out of Memory for TFT When Doing Inference

Open YojoNick opened this issue 3 years ago • 4 comments

PyTorch-Forecasting version: 0.9.0 PyTorch version: 1.9.0+cu111 Python version: 3.6.9 Operating System: Ubuntu 18.04

Expected behavior

When executing predict() for the Temporal Fusion Transformer on a dataset, I expected the GPU memory utilization would be static (i.e. not growing over time).

Actual behavior

However, when doing inference on a dataset, using the Temporal Fusion Transformer, the GPU memory utilization would grow overtime until I received an out of GPU memory error.

Code to reproduce the problem

tft = TemporalFusionTransformer.load_from_checkpoint(checkPointFile)
tft.to('cuda')
raw_predictions, inputs = tft.predict(testSetDataLoader, mode="raw", return_x=True, show_progress_bar=True)

I tried setting the number of workers for the data loader to 4 and then 0, both still give me an out of memory issue. I also tried decreasing the batch size.

If I reduce the size of the test set Time Series Dataset, then I don't get an out of memory issue. However, because I'm using a data loader with a small mini batch size, I would expect that I wouldn't run out of memory given I'm doing inference one mini batch at a time...

YojoNick avatar Nov 28 '21 17:11 YojoNick

@YojoNick same thing happens to me after I override create_log method of TemporalFusionTransformer.
Is your TemporalFusionTransformer modified too?

moeiniamir avatar Jan 08 '22 14:01 moeiniamir

Any suggestions to solve this issue? I do get the same problem when predicting on a large dataset.

hschmiedt avatar Mar 28 '22 14:03 hschmiedt

You are storing the entire output in memory which can become very big, particularly in "raw" mode. Depending on your usecase, you might want to write it to disk instead.

jdb78 avatar Mar 28 '22 15:03 jdb78

I think this is a little bug happening when predicting in raw mode.

In pytorch_forecasting/utils.py , the move_to_device function should be changed to:

elif isinstance(x, OutputMixIn):
        x = x.__class__(**{name: move_to_device(xi, device=device) for name, xi in x.items()})
        return x

Currently, the CPU-moved tensors are not assigned and therefore stay on the GPU.

oliverester avatar Aug 26 '22 06:08 oliverester