decord GPU memory leak

trafficstars

I am decoding a list of videos with:

video = VideoReader(str(video_path), ctx=gpu(0))

frame_ids = list(range(300))

frames = video.get_batch(frame_ids).asnumpy()

on every iteration, GPU Ram consumption goes up till I get out of memory error.

Feb 28 '20 22:02 ternaus

Without .asnumpy() memory leak exist too. I use: frames = torch.utils.dlpack.from_dlpack(video.get_batch(frame_ids).to_dlpack()) frames are located on gpu in this case

Feb 29 '20 09:02 leigh-plt

Can you guys post your cuda version/ decive type? I have tried on my local machine with 1070ti and cuda 10.1.243, didn't notice any mem leak.

from decord import VideoReader
from decord import cpu, gpu

video_path = '/home/joshua/Dev/decord/examples/flipping_a_pancake.mkv'

video = VideoReader(str(video_path), ctx=gpu(0))

frame_ids = list(range(300))

for i in range(100):
  frames = video.get_batch(frame_ids).asnumpy()
  if i % 10 == 0: 
    print(frames.shape)

nvidia-smi record can be viewed here: https://asciinema.org/a/xgI8tFXNlpAoDcVJdxLgag8eW GPU mem from 627M to 845M and pretty constant.

Mar 01 '20 00:03 zhreshold

Thanks @leigh-plt, I modified batch loading seems to be ok in notebook environment of Kaggle after dl_pack trick:

def get_decord_video_batch(fname, sz, freq=10):
    "get batch tensor for inference, original for cropping and H,W of video"
    video = VideoReader(str(fname), ctx=gpu())
#     data = video.get_batch(range(0, len(video), 10))
    data = from_dlpack(to_dlpack(video.get_batch(range(0, len(video), 10))))
    H,W = data.shape[2:]
    del video; gc.collect()
    return (data, None, (H, W))

Although I had one successful run there had been unsuccessful runs after that. How can we fix it?

Mar 05 '20 05:03 KeremTurgutlu

facing memory leak issues of CPU, on GPU working fine

Mar 06 '20 16:03 akansal1

facing the same issues, neither dl_pack nor asnumpy work for me.

@KeremTurgutlu i'm on the kaggle env as well.

Mar 13 '20 15:03 yitang

Can you guys post the kaggle gpu and cuda version?

Mar 13 '20 16:03 zhreshold

the gpu is Tesla P100-PCIE-16GB.

cuda is 10.0.130.

this is the traceback:

[16:44:39] /kaggle/working/reader/src/video/nvcodec/cuda_threaded_decoder.cc:55: Kernel module version 418.67, so using our own stream.

7%|▋ | 27/400 [00:23<03:55, 1.58it/s][16:44:40] /kaggle/working/reader/src/video/nvcodec/cuda_threaded_decoder.cc:35: Using device: Tesla P100-PCIE-16GB

[16:44:40] /kaggle/working/reader/src/video/nvcodec/cuda_threaded_decoder.cc:55: Kernel module version 418.67, so using our own stream.

7%|▋ | 28/400 [00:24<03:51, 1.61it/s][16:44:41] /kaggle/working/reader/src/video/nvcodec/cuda_threaded_decoder.cc:35: Using device: Tesla P100-PCIE-16GB

[16:44:41] /kaggle/working/reader/src/video/nvcodec/cuda_threaded_decoder.cc:55: Kernel module version 418.67, so using our own stream.

terminate called after throwing an instance of 'dmlc::Error'

what(): [16:44:41] /kaggle/working/reader/src/video/nvcodec/cuda_threaded_decoder.cc:332: Check failed: arr.defined()

Stack trace returned 10 entries:

[bt] (0) /kaggle/working/reader/build/libdecord.so(dmlc::StackTrace[abi:cxx11](unsigned long)+0x85) [0x7f15712ee059]

[bt] (1) /kaggle/working/reader/build/libdecord.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x20) [0x7f15712ee334]

[bt] (2) /kaggle/working/reader/build/libdecord.so(decord::cuda::CUThreadedDecoder::ConvertThread()+0x1a5) [0x7f157135d659]

[bt] (3) /kaggle/working/reader/build/libdecord.so(void std::__invoke_impl<void, void (decord::cuda::CUThreadedDecoder::* const&)(), decord::cuda::CUThreadedDecoder*>(std::__invoke_memfun_deref, void (decord::cuda::CUThreadedDecoder::* const&)(), decord::cuda::CUThreadedDecoder*&&)+0x66) [0x7f15713669bc]

[bt] (4) /kaggle/working/reader/build/libdecord.so(std::result_of<void (decord::cuda::CUThreadedDecoder::* const&(decord::cuda::CUThreadedDecoder*&&))()>::type std::__invoke<void (decord::cuda::CUThreadedDecoder::* const&)(), decord::cuda::CUThreadedDecoder*>(void (decord::cuda::CUThreadedDecoder::* const&)(), decord::cuda::CUThreadedDecoder*&&)+0x3f) [0x7f1571366949]

[bt] (5) /kaggle/working/reader/build/libdecord.so(decltype (__invoke((this)._M_pmf, (forwarddecord::cuda::CUThreadedDecoder*)({parm#1}))) std::_Mem_fn_base<void (decord::cuda::CUThreadedDecoder::)(), true>::operator()decord::cuda::CUThreadedDecoder*(decord::cuda::CUThreadedDecoder*&&) const+0x2e) [0x7f15713668fa]

[bt] (6) /kaggle/working/reader/build/libdecord.so(void std::_Bind_simple<std::_Mem_fn<void (decord::cuda::CUThreadedDecoder::)()> (decord::cuda::CUThreadedDecoder)>::_M_invoke<0ul>(std::_Index_tuple<0ul>)+0x43) [0x7f15713668c5]

[bt] (7) /kaggle/working/reader/build/libdecord.so(std::_Bind_simple<std::_Mem_fn<void (decord::cuda::CUThreadedDecoder::)()> (decord::cuda::CUThreadedDecoder)>::operator()()+0x1d) [0x7f1571366813]

[bt] (8) /kaggle/working/reader/build/libdecord.so(std::thread::_State_impl<std::_Bind_simple<std::_Mem_fn<void (decord::cuda::CUThreadedDecoder::)()> (decord::cuda::CUThreadedDecoder)> >::_M_run()+0x1c) [0x7f15713667f2]

[bt] (9) /opt/conda/lib/python3.6/site-packages/matplotlib/../../../libstdc++.so.6(+0xb8408) [0x7f157222e408]

Mar 13 '20 16:03 yitang

Ubuntu 19.04 last update drivers, cuda 10.2 have leak too. 730 gtx

Kaggle: cuda_threaded_decoder.cc:35: Using device: Tesla P100-PCIE-16GB cuda_threaded_decoder.cc:55: Kernel module version 418.67, so using our own stream. OS debian stretch

Mar 13 '20 22:03 leigh-plt

Thanks @leigh-plt this might be helpful for inference in the kernel!

Mar 14 '20 07:03 KeremTurgutlu

@leigh-plt are you writing a kernel on how to using video processing framwork too?

Mar 14 '20 21:03 yitang

Any updates on this? I am facing memory leak issues on CPU.

May 13 '20 20:05 anlarro

It seems that deleting the frame manually avoids the leak:

vr = VideoReader(video_path)
for frame in vr:
    print(frame.shape)
    del frame

Note: I tested it for CPU only, but from the source code it seems that this would be the case for GPU as well.

Sep 08 '20 12:09 Black-Hack

It seems that deleting the frame manually avoids the leak:
vr = VideoReader(video_path)
for frame in vr:
    print(frame.shape)
    del frame
Note: I tested it for CPU only, but from the source code it seems that this would be the case for GPU as well.

Same issue with the CPU memory leak. If you deleted the frame, how do you further process this if it is to be applied into a deep learning training framework?

Sep 29 '20 12:09 huang-ziyuan

It seems that deleting the frame manually avoids the leak:
vr = VideoReader(video_path)
for frame in vr:
    print(frame.shape)
    del frame
Note: I tested it for CPU only, but from the source code it seems that this would be the case for GPU as well.
Same issue with the CPU memory leak. If you deleted the frame, how do you further process this if it is to be applied into a deep learning training framework?

Get a bunch of frames, train the model, get the next bunch.

Feb 21 '21 18:02 Ehsan1997

decord decord copied to clipboard

GPU memory leak

decord
decord copied to clipboard