decord icon indicating copy to clipboard operation
decord copied to clipboard

GPU memory leak

Open ternaus opened this issue 5 years ago • 14 comments
trafficstars

I am decoding a list of videos with:

video = VideoReader(str(video_path), ctx=gpu(0))

frame_ids = list(range(300))

frames = video.get_batch(frame_ids).asnumpy()

on every iteration, GPU Ram consumption goes up till I get out of memory error.

ternaus avatar Feb 28 '20 22:02 ternaus

Without .asnumpy() memory leak exist too. I use: frames = torch.utils.dlpack.from_dlpack(video.get_batch(frame_ids).to_dlpack()) frames are located on gpu in this case

leigh-plt avatar Feb 29 '20 09:02 leigh-plt

Can you guys post your cuda version/ decive type? I have tried on my local machine with 1070ti and cuda 10.1.243, didn't notice any mem leak.

from decord import VideoReader
from decord import cpu, gpu

video_path = '/home/joshua/Dev/decord/examples/flipping_a_pancake.mkv'

video = VideoReader(str(video_path), ctx=gpu(0))

frame_ids = list(range(300))

for i in range(100):
  frames = video.get_batch(frame_ids).asnumpy()
  if i % 10 == 0: 
    print(frames.shape)

nvidia-smi record can be viewed here: https://asciinema.org/a/xgI8tFXNlpAoDcVJdxLgag8eW GPU mem from 627M to 845M and pretty constant.

zhreshold avatar Mar 01 '20 00:03 zhreshold

Thanks @leigh-plt, I modified batch loading seems to be ok in notebook environment of Kaggle after dl_pack trick:

def get_decord_video_batch(fname, sz, freq=10):
    "get batch tensor for inference, original for cropping and H,W of video"
    video = VideoReader(str(fname), ctx=gpu())
#     data = video.get_batch(range(0, len(video), 10))
    data = from_dlpack(to_dlpack(video.get_batch(range(0, len(video), 10))))
    H,W = data.shape[2:]
    del video; gc.collect()
    return (data, None, (H, W))

Although I had one successful run there had been unsuccessful runs after that. How can we fix it?

KeremTurgutlu avatar Mar 05 '20 05:03 KeremTurgutlu

facing memory leak issues of CPU, on GPU working fine

akansal1 avatar Mar 06 '20 16:03 akansal1

facing the same issues, neither dl_pack nor asnumpy work for me.

@KeremTurgutlu i'm on the kaggle env as well.

yitang avatar Mar 13 '20 15:03 yitang

Can you guys post the kaggle gpu and cuda version?

zhreshold avatar Mar 13 '20 16:03 zhreshold

the gpu is Tesla P100-PCIE-16GB.

cuda is 10.0.130.

this is the traceback:

[16:44:39] /kaggle/working/reader/src/video/nvcodec/cuda_threaded_decoder.cc:55: Kernel module version 418.67, so using our own stream.

7%|▋ | 27/400 [00:23<03:55, 1.58it/s][16:44:40] /kaggle/working/reader/src/video/nvcodec/cuda_threaded_decoder.cc:35: Using device: Tesla P100-PCIE-16GB

[16:44:40] /kaggle/working/reader/src/video/nvcodec/cuda_threaded_decoder.cc:55: Kernel module version 418.67, so using our own stream.

7%|▋ | 28/400 [00:24<03:51, 1.61it/s][16:44:41] /kaggle/working/reader/src/video/nvcodec/cuda_threaded_decoder.cc:35: Using device: Tesla P100-PCIE-16GB

[16:44:41] /kaggle/working/reader/src/video/nvcodec/cuda_threaded_decoder.cc:55: Kernel module version 418.67, so using our own stream.

terminate called after throwing an instance of 'dmlc::Error'

what(): [16:44:41] /kaggle/working/reader/src/video/nvcodec/cuda_threaded_decoder.cc:332: Check failed: arr.defined()

Stack trace returned 10 entries:

[bt] (0) /kaggle/working/reader/build/libdecord.so(dmlc::StackTrace[abi:cxx11](unsigned long)+0x85) [0x7f15712ee059]

[bt] (1) /kaggle/working/reader/build/libdecord.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x20) [0x7f15712ee334]

[bt] (2) /kaggle/working/reader/build/libdecord.so(decord::cuda::CUThreadedDecoder::ConvertThread()+0x1a5) [0x7f157135d659]

[bt] (3) /kaggle/working/reader/build/libdecord.so(void std::__invoke_impl<void, void (decord::cuda::CUThreadedDecoder::* const&)(), decord::cuda::CUThreadedDecoder*>(std::__invoke_memfun_deref, void (decord::cuda::CUThreadedDecoder::* const&)(), decord::cuda::CUThreadedDecoder*&&)+0x66) [0x7f15713669bc]

[bt] (4) /kaggle/working/reader/build/libdecord.so(std::result_of<void (decord::cuda::CUThreadedDecoder::* const&(decord::cuda::CUThreadedDecoder*&&))()>::type std::__invoke<void (decord::cuda::CUThreadedDecoder::* const&)(), decord::cuda::CUThreadedDecoder*>(void (decord::cuda::CUThreadedDecoder::* const&)(), decord::cuda::CUThreadedDecoder*&&)+0x3f) [0x7f1571366949]

[bt] (5) /kaggle/working/reader/build/libdecord.so(decltype (__invoke((this)._M_pmf, (forwarddecord::cuda::CUThreadedDecoder*)({parm#1}))) std::_Mem_fn_base<void (decord::cuda::CUThreadedDecoder::)(), true>::operator()decord::cuda::CUThreadedDecoder*(decord::cuda::CUThreadedDecoder*&&) const+0x2e) [0x7f15713668fa]

[bt] (6) /kaggle/working/reader/build/libdecord.so(void std::_Bind_simple<std::_Mem_fn<void (decord::cuda::CUThreadedDecoder::)()> (decord::cuda::CUThreadedDecoder)>::_M_invoke<0ul>(std::_Index_tuple<0ul>)+0x43) [0x7f15713668c5]

[bt] (7) /kaggle/working/reader/build/libdecord.so(std::_Bind_simple<std::_Mem_fn<void (decord::cuda::CUThreadedDecoder::)()> (decord::cuda::CUThreadedDecoder)>::operator()()+0x1d) [0x7f1571366813]

[bt] (8) /kaggle/working/reader/build/libdecord.so(std::thread::_State_impl<std::_Bind_simple<std::_Mem_fn<void (decord::cuda::CUThreadedDecoder::)()> (decord::cuda::CUThreadedDecoder)> >::_M_run()+0x1c) [0x7f15713667f2]

[bt] (9) /opt/conda/lib/python3.6/site-packages/matplotlib/../../../libstdc++.so.6(+0xb8408) [0x7f157222e408]

yitang avatar Mar 13 '20 16:03 yitang

Ubuntu 19.04 last update drivers, cuda 10.2 have leak too. 730 gtx

Kaggle: cuda_threaded_decoder.cc:35: Using device: Tesla P100-PCIE-16GB cuda_threaded_decoder.cc:55: Kernel module version 418.67, so using our own stream. OS debian stretch

leigh-plt avatar Mar 13 '20 22:03 leigh-plt

Thanks @leigh-plt this might be helpful for inference in the kernel!

KeremTurgutlu avatar Mar 14 '20 07:03 KeremTurgutlu

@leigh-plt are you writing a kernel on how to using video processing framwork too?

yitang avatar Mar 14 '20 21:03 yitang

Any updates on this? I am facing memory leak issues on CPU.

anlarro avatar May 13 '20 20:05 anlarro

It seems that deleting the frame manually avoids the leak:

vr = VideoReader(video_path)
for frame in vr:
    print(frame.shape)
    del frame

Note: I tested it for CPU only, but from the source code it seems that this would be the case for GPU as well.

Black-Hack avatar Sep 08 '20 12:09 Black-Hack

It seems that deleting the frame manually avoids the leak:

vr = VideoReader(video_path)
for frame in vr:
    print(frame.shape)
    del frame

Note: I tested it for CPU only, but from the source code it seems that this would be the case for GPU as well.

Same issue with the CPU memory leak. If you deleted the frame, how do you further process this if it is to be applied into a deep learning training framework?

huang-ziyuan avatar Sep 29 '20 12:09 huang-ziyuan

It seems that deleting the frame manually avoids the leak:

vr = VideoReader(video_path)
for frame in vr:
    print(frame.shape)
    del frame

Note: I tested it for CPU only, but from the source code it seems that this would be the case for GPU as well.

Same issue with the CPU memory leak. If you deleted the frame, how do you further process this if it is to be applied into a deep learning training framework?

Get a bunch of frames, train the model, get the next bunch.

Ehsan1997 avatar Feb 21 '21 18:02 Ehsan1997