DALI icon indicating copy to clipboard operation
DALI copied to clipboard

Video reader fails when random shuffle is enabled

Open Grzego opened this issue 8 months ago • 2 comments

Version

1.30

Describe the bug.

Hi! I recently stumbled upon a bug where if I create a pipeline with random_shuffle=True it will fail with error:

Detected variable frame rate video. The decoder returned frame that is past the expected one

Even though the video I'm using was transcoded to be of fixed frame rate (5 fps) and works perfectly fine when random_shuffle=False is passed to the pipeline. I'm attaching code for reproduction and file that fails for me. I'd appreciate any help with this issue. :)

File used to reproduce the issue (basically black screen, duration=4m40s, fps=5): https://github.com/NVIDIA/DALI/assets/5787705/57817017-b574-4bbf-b7aa-d9c61d856b66

Minimum reproducible example

import nvidia.dali.fn as fn
from nvidia.dali import pipeline_def
from nvidia.dali.plugin.pytorch import DALIGenericIterator, LastBatchPolicy


class CustomDataloader:
    def __init__(
        self,
        video_paths,
        batch_size,
        num_threads,
        sequence_length,
        stride=1,
        pad_sequences=True,
        transpose=True,
        resolution=None,
        random_shuffle=False,
        drop_last=False,
    ):
        self.sequence_length = sequence_length
        self.stride = stride
        self.pad_sequences = pad_sequences
        self.transpose = transpose
        self.resolution = resolution
        self.random_shuffle = random_shuffle

        self.video_pipeline = self._video_pipeline(
            video_paths=video_paths,
            batch_size=batch_size,
            num_threads=num_threads,
            device_id=0,
        )
        self.dali_iterator = DALIGenericIterator(
            [self.video_pipeline],
            ["frames", "video_idx", "frame_idx"],
            reader_name="clips_reader",
            auto_reset=False,
            last_batch_policy=LastBatchPolicy.DROP if drop_last else LastBatchPolicy.PARTIAL,
        )

    @pipeline_def
    def _video_pipeline(self, video_paths):
        frames, video_idx, frame_idx = fn.readers.video(
            name="clips_reader",
            device="gpu",
            filenames=video_paths,
            enable_timestamps=True,
            labels=[],
            sequence_length=self.sequence_length,
            pad_sequences=self.pad_sequences,
            shard_id=0,
            num_shards=1,
            random_shuffle=self.random_shuffle,
            initial_fill=512,
            stride=self.stride,
            prefetch_queue_depth=4,
            pad_last_batch=True,
            read_ahead=True,
            seed=42,
            file_list_include_preceding_frame=False,
        )
        if self.transpose:
            frames = fn.transpose(frames, perm=(0, 3, 1, 2))

        return frames, video_idx, frame_idx

    def __iter__(self):
        for batch, *_ in self.dali_iterator:
            yield batch

        self.dali_iterator.reset()


def main():
    print("> random_shuffle=False")
    dataloader = CustomDataloader(
        ["void-4m40s.mp4"],
        batch_size=1,
        num_threads=1,
        sequence_length=5 * 15,
        random_shuffle=False,
        drop_last=True,
    )

    for _ in dataloader:
        pass

    print("> random_shuffle=True")
    dataloader = CustomDataloader(
        ["void-4m40s.mp4"],
        batch_size=1,
        num_threads=1,
        sequence_length=5 * 15,
        random_shuffle=True,
        drop_last=True,
    )

    for _ in dataloader:
        pass


if __name__ == "__main__":
    main()

Relevant log output

> random_shuffle=False
> random_shuffle=True
140362360223296 Exception in thread: [/opt/dali/dali/operators/reader/loader/video_loader.cc:683] Detected variable frame rate video. The decoder returned frame that is past the expected one
Stacktrace (6 entries):
[frame 0]: /../python3.10/site-packages/nvidia/dali/libdali_operators.so(+0x69a2ee) [0x7fa9d20252ee]
[frame 1]: /../python3.10/site-packages/nvidia/dali/libdali_operators.so(+0x54008c) [0x7fa9d1ecb08c]
[frame 2]: /../python3.10/site-packages/nvidia/dali/libdali_operators.so(+0x284f6fc) [0x7fa9d41da6fc]
[frame 3]: /../python3.10/site-packages/nvidia/dali/libdali_operators.so(+0x4ad4a50) [0x7fa9d645fa50]
[frame 4]: /lib/x86_64-linux-gnu/libc.so.6(+0x94b43) [0x7faa03ae9b43]
[frame 5]: /lib/x86_64-linux-gnu/libc.so.6(+0x126a00) [0x7faa03b7ba00]

Traceback (most recent call last):
  File "/../bug.py", line 117, in <module>
    main()
  File "/../bug.py", line 112, in main
    for _ in dataloader:
  File "/../bug.py", line 82, in __iter__
    for batch, *_ in self.dali_iterator:
  File "/../python3.10/site-packages/nvidia/dali/plugin/pytorch.py", line 211, in __next__
    outputs = self._get_outputs()
  File "/../python3.10/site-packages/nvidia/dali/plugin/base_iterator.py", line 298, in _get_outputs
    outputs.append(p.share_outputs())
  File "/../python3.10/site-packages/nvidia/dali/pipeline.py", line 1003, in share_outputs
    return self._pipe.ShareOutputs()
RuntimeError: Critical error in pipeline:
Error when executing GPU operator readers__Video encountered:
Error in worker thread: [/opt/dali/dali/operators/reader/loader/video_loader.cc:683] Detected variable frame rate video. The decoder returned frame that is past the expected one
Stacktrace (6 entries):
[frame 0]: /../python3.10/site-packages/nvidia/dali/libdali_operators.so(+0x69a2ee) [0x7fa9d20252ee]
[frame 1]: /../python3.10/site-packages/nvidia/dali/libdali_operators.so(+0x54008c) [0x7fa9d1ecb08c]
[frame 2]: /../python3.10/site-packages/nvidia/dali/libdali_operators.so(+0x284f6fc) [0x7fa9d41da6fc]
[frame 3]: /../python3.10/site-packages/nvidia/dali/libdali_operators.so(+0x4ad4a50) [0x7fa9d645fa50]
[frame 4]: /lib/x86_64-linux-gnu/libc.so.6(+0x94b43) [0x7faa03ae9b43]
[frame 5]: /lib/x86_64-linux-gnu/libc.so.6(+0x126a00) [0x7faa03b7ba00]

Current pipeline object is no longer valid.

Other/Misc.

No response

Check for duplicates

  • [X] I have searched the open bugs/issues and have found no duplicates for this bug report

Grzego avatar Oct 05 '23 11:10 Grzego

Hi @Grzego,

Thank you for reaching out. The code you provided reproduces the problem on my side too. Let me investigate what is going on under the hood.

JanuszL avatar Oct 05 '23 13:10 JanuszL

I checked the code and it seems that the video decoder is fed with the same packets from the video container and returns frames with different timestamps (in the faulty case we seek to frame 750, but the decoder returns 749 and then 751 missing 750, when the shuffling is off it return 750 and 751 as expected). I have reported the corresponding video team but it may take a while for them to get back to me.

JanuszL avatar Oct 06 '23 18:10 JanuszL