DALI fn.readers.video_resize at longer sequence lengths gives dali::CUDAError without nvidia-smi

Hello - I've been running into an issue where long sequence lengths in video_resize are crashing the readers.video_resize pipe. I am using a V100 (16160 MB memory) with the following versions:

NVIDIA Driver: 470.52.02
CUDA driver version 11.4
DALI version 1.3.0

Ideally, I want to have a sequence length = 450 with any stride length for the decoded video. The decoded video would be resized to 224,224 and normalized. When I try this pipe below, the nvidia-smi usage is as follows:

sequence length	size (MB)
325	2647
340	2743
345	2775
345 with prefetch cue depth: 3	6871
345 with prefetch cue depth: 5	10967

However, anything >= 350 frames in the sequence length results in this error message:

  File "/workspace/EndoVideos/utilities/dali_loader/dali_loader.py", line 68, in <module>
    sequences_out, labels, start_frame_num, timestamps = pipe.run()
  File "/opt/conda/lib/python3.8/site-packages/nvidia/dali/pipeline.py", line 917, in run
    return self.outputs()
  File "/opt/conda/lib/python3.8/site-packages/nvidia/dali/pipeline.py", line 816, in outputs
    return self._outputs()
  File "/opt/conda/lib/python3.8/site-packages/nvidia/dali/pipeline.py", line 900, in _outputs
    return self._pipe.Outputs()
RuntimeError: Critical error in pipeline:
Error when executing GPU operator readers__VideoResize encountered:
CUDA driver API error CUDA_ERROR_ILLEGAL_ADDRESS (700):
an illegal memory access was encountered
Current pipeline object is no longer valid.
terminate called after throwing an instance of 'dali::CUDAError'
  what():  CUDA runtime API error cudaErrorIllegalAddress (700):
an illegal memory access was encountered
Aborted (core dumped)

From the looks of it, there should not be any GPU memory space issues, seeing as there should be still > 11 GB GPU memory free. I've tried removing the normalize and halving the resize dimensions to 112,112, but the same issue (at same sequence size) persists. Are there any known issues/limitations to GPU decoding/DALI and longer sequence lengths that I'm missing?

A sample of the code is below. I am unable to share the video files due to PHI, but the videos are substantially long (some > 40 min), are .mp4 files, and have a width = 1920 and height = 1080.

I am using the following pipe:

# Pipe parameters
batch_size=1
sequence_length = 355
stride_length = 15

# Temp file creation --- {video_name} is name of a file. Labels and lengths are dummy.
dali_extra_path = '/workspace/data/Datasets/timing_test_ds/'
file_list_txt = dali_extra_path + "{video_name}.mp4 1 0 10000\n"
file_list_txt = dali_extra_path + "{video_name}.mp4 2 0 10000\n"
file_list_txt += dali_extra_path + "{video_name}.mp4 1 0 10000\n"
file_list_txt += dali_extra_path + "{video_name}.mp4 3 0 10000\n"
file_list_txt += dali_extra_path + "{video_name}.mp4 2 0 10000\n"
tf = tempfile.NamedTemporaryFile()
tf.write(str.encode(file_list_txt))
tf.flush()


@pipeline_def
def video_pipe(file_list):
    video, label, start_frame_num, timestamps = \
        fn.readers.video_resize(device="gpu", sequence_length=sequence_length, prefetch_queue_depth =1,
                                file_list=file_list, skip_vfr_check=True, random_shuffle=True, 
                                initial_fill=1, stride = stride_length,
                                 file_list_frame_num=frame_num_based_labels,
                                enable_frame_num=True, enable_timestamps=True, size = [224,224], normalized = False)
    
return video, label, start_frame_num, timestamps

pipe = video_pipe(batch_size=batch_size, num_threads=1, device_id=0, file_list=tf.name)
pipe.build()

Thank you in advanced for the help!

Nov 29 '21 22:11 MattWittbrodt

Hi @MattWittbrodt,

Thank you for reporting the problem. I managed to reproduce it as well. We will keep you posted.

Nov 30 '21 00:11 JanuszL

The minimal repro I'm using is (just for the reference):

from nvidia.dali import fn, pipeline_def

# Pipe parameters
batch_size=1
sequence_length = 355

@pipeline_def
def video_pipe():
    video, label = \
        fn.readers.video(device="gpu", sequence_length=sequence_length,
                                filenames="test/sintel_trailer-1080p.mp4", labels=[])
    return video, label

pipe = video_pipe(batch_size=batch_size, num_threads=1, device_id=0)
pipe.build()
while True:
    pipe.run()

Nov 30 '21 10:11 JanuszL

Thank you for the updating and looking at this issue!

Nov 30 '21 15:11 MattWittbrodt

Tracked as nv3452920.

Nov 30 '21 18:11 JanuszL

@JanuszL if this is of any help, I was able to recreate the issue on a T4 (16127 MB GPU RAM) and P100 (16280 MB of GPU RAM)

Dec 02 '21 20:12 MattWittbrodt

I don't see anything obviously wrong in the way how DALI uses the Video SDK. I just asked the relevant team to check on their side. It may take a while.

Dec 03 '21 10:12 JanuszL

Thank you very much for following up.

Dec 03 '21 20:12 MattWittbrodt

Hello, Did you fix the bug ? Thank you

Feb 02 '24 11:02 vincentvic

I'm sorry but I don't have any update yet.

Feb 02 '24 12:02 JanuszL

We have found the root cause; a fix will be available shortly.

Feb 02 '24 15:02 mzient

Nice ! Thank you ! Let me know when it will be fixed :)

Feb 02 '24 15:02 vincentvic

Yes, it was a simple thing - https://github.com/NVIDIA/DALI/pull/5307. Please check the nightly build that follows the merge of the mentioned PR.

Feb 02 '24 16:02 JanuszL

DALI 1.35 has been released, https://github.com/NVIDIA/DALI/releases/tag/v1.35.0

The problem should be fixed. If you face problems with the current release, please feel free to re-open the issue, I am closing it for now.

Feb 27 '24 17:02 stiepan

DALI DALI copied to clipboard

fn.readers.video_resize at longer sequence lengths gives dali::CUDAError without nvidia-smi

DALI
DALI copied to clipboard