DALI
DALI copied to clipboard
fn.readers.video_resize at longer sequence lengths gives dali::CUDAError without nvidia-smi
Hello - I've been running into an issue where long sequence lengths in video_resize are crashing the readers.video_resize pipe. I am using a V100 (16160 MB memory) with the following versions:
NVIDIA Driver: 470.52.02
CUDA driver version 11.4
DALI version 1.3.0
Ideally, I want to have a sequence length = 450 with any stride length for the decoded video. The decoded video would be resized to 224,224 and normalized. When I try this pipe below, the nvidia-smi usage is as follows:
| sequence length | size (MB) |
|---|---|
| 325 | 2647 |
| 340 | 2743 |
| 345 | 2775 |
| 345 with prefetch cue depth: 3 | 6871 |
| 345 with prefetch cue depth: 5 | 10967 |
However, anything >= 350 frames in the sequence length results in this error message:
File "/workspace/EndoVideos/utilities/dali_loader/dali_loader.py", line 68, in <module>
sequences_out, labels, start_frame_num, timestamps = pipe.run()
File "/opt/conda/lib/python3.8/site-packages/nvidia/dali/pipeline.py", line 917, in run
return self.outputs()
File "/opt/conda/lib/python3.8/site-packages/nvidia/dali/pipeline.py", line 816, in outputs
return self._outputs()
File "/opt/conda/lib/python3.8/site-packages/nvidia/dali/pipeline.py", line 900, in _outputs
return self._pipe.Outputs()
RuntimeError: Critical error in pipeline:
Error when executing GPU operator readers__VideoResize encountered:
CUDA driver API error CUDA_ERROR_ILLEGAL_ADDRESS (700):
an illegal memory access was encountered
Current pipeline object is no longer valid.
terminate called after throwing an instance of 'dali::CUDAError'
what(): CUDA runtime API error cudaErrorIllegalAddress (700):
an illegal memory access was encountered
Aborted (core dumped)
From the looks of it, there should not be any GPU memory space issues, seeing as there should be still > 11 GB GPU memory free. I've tried removing the normalize and halving the resize dimensions to 112,112, but the same issue (at same sequence size) persists. Are there any known issues/limitations to GPU decoding/DALI and longer sequence lengths that I'm missing?
A sample of the code is below. I am unable to share the video files due to PHI, but the videos are substantially long (some > 40 min), are .mp4 files, and have a width = 1920 and height = 1080.
I am using the following pipe:
# Pipe parameters
batch_size=1
sequence_length = 355
stride_length = 15
# Temp file creation --- {video_name} is name of a file. Labels and lengths are dummy.
dali_extra_path = '/workspace/data/Datasets/timing_test_ds/'
file_list_txt = dali_extra_path + "{video_name}.mp4 1 0 10000\n"
file_list_txt = dali_extra_path + "{video_name}.mp4 2 0 10000\n"
file_list_txt += dali_extra_path + "{video_name}.mp4 1 0 10000\n"
file_list_txt += dali_extra_path + "{video_name}.mp4 3 0 10000\n"
file_list_txt += dali_extra_path + "{video_name}.mp4 2 0 10000\n"
tf = tempfile.NamedTemporaryFile()
tf.write(str.encode(file_list_txt))
tf.flush()
@pipeline_def
def video_pipe(file_list):
video, label, start_frame_num, timestamps = \
fn.readers.video_resize(device="gpu", sequence_length=sequence_length, prefetch_queue_depth =1,
file_list=file_list, skip_vfr_check=True, random_shuffle=True,
initial_fill=1, stride = stride_length,
file_list_frame_num=frame_num_based_labels,
enable_frame_num=True, enable_timestamps=True, size = [224,224], normalized = False)
return video, label, start_frame_num, timestamps
pipe = video_pipe(batch_size=batch_size, num_threads=1, device_id=0, file_list=tf.name)
pipe.build()
Thank you in advanced for the help!
Hi @MattWittbrodt,
Thank you for reporting the problem. I managed to reproduce it as well. We will keep you posted.
The minimal repro I'm using is (just for the reference):
from nvidia.dali import fn, pipeline_def
# Pipe parameters
batch_size=1
sequence_length = 355
@pipeline_def
def video_pipe():
video, label = \
fn.readers.video(device="gpu", sequence_length=sequence_length,
filenames="test/sintel_trailer-1080p.mp4", labels=[])
return video, label
pipe = video_pipe(batch_size=batch_size, num_threads=1, device_id=0)
pipe.build()
while True:
pipe.run()
Thank you for the updating and looking at this issue!
Tracked as nv3452920.
@JanuszL if this is of any help, I was able to recreate the issue on a T4 (16127 MB GPU RAM) and P100 (16280 MB of GPU RAM)
I don't see anything obviously wrong in the way how DALI uses the Video SDK. I just asked the relevant team to check on their side. It may take a while.
Thank you very much for following up.
Hello, Did you fix the bug ? Thank you
I'm sorry but I don't have any update yet.
We have found the root cause; a fix will be available shortly.
Nice ! Thank you ! Let me know when it will be fixed :)
Yes, it was a simple thing - https://github.com/NVIDIA/DALI/pull/5307. Please check the nightly build that follows the merge of the mentioned PR.
DALI 1.35 has been released, https://github.com/NVIDIA/DALI/releases/tag/v1.35.0
The problem should be fixed. If you face problems with the current release, please feel free to re-open the issue, I am closing it for now.