DALI Questions about optical flow quality & input video formats

Hi,

I'm relatively new to the DALI SDK & optical flow and have a few questions, and I hope this is the right place to ask.

Source code

Using DALI, I have implemented a simple video decoding pipeline that can optionally return optical flow frames as well, depending on the use case:

@pipeline_def
def video_pipe(filename: str, sequence_length: int, with_of: bool, goal_shape: (int, int), preset: str = "slow"):
    video = fn.readers.video(device="gpu", filenames=filename, sequence_length=sequence_length,
                             shard_id=0, num_shards=1, random_shuffle=False, initial_fill=batch_size,
                             skip_vfr_check=True, pad_sequences=True, name="Reader", file_list_include_preceding_frame=True)

    video_bgr = fn.color_space_conversion(video, image_type=types.DALIImageType.RGB, output_type=types.DALIImageType.BGR)

    if with_of:
        of = fn.optical_flow(video, output_format=4, preset=of_presets[preset], enable_temporal_hints=True)
        of = fn.resize(of, resize_x=goal_shape[0], resize_y=goal_shape[1])
        return video_bgr, of
    else:
        return video_bgr

This pipeline is based on the example code provided by the documentation, but with some slight adjustments to make it possible to toggle between calculating OF or not.

Question

I have questions specifically about the optical flow output generated by my pipeline. I have noticed the generated flow output quality depends on the input video file format. Additionally, I did encounter identical behavior in a previous version of my project using OpenCV's cuda implementation of NVIDIA Optical Flow, so I'm assuming that these two options use the same functions to calculate optical flow.

To test this, I draw the optical flow generated by the pipeline and input a video in various formats (.mov and .webm, I have linked them below). It appears the outputs generated by the two differ quite a lot, where the .webm output has much larger, more blocky patterns than the .mov file. Additionally, I think the results contain a lot of noise. I kept all the input parameters identical, e.g. grid size = 4.

My questions are as follows:

What is the reason that there is so much noise in these video outputs, while the input video is relatively noise-free and has relatively calm movements?
Why is there such a large difference between the outputs when the input video format differs?

I hope I am not constrained to using a specific type of video input format, as the video file formats this pipeline will be processing will be .mov files with mpeg4 codec (standard iPhone video output format).

Many thanks in advance!

Example videos

Input video Flow output .mov file Flow output .webm file

Feb 04 '22 10:02 Renzzauw

Hi @Renzzauw,

Yes, DALI and OpenCV's cuda implementation uses the same hardware unit inside Nvidia GPUs (available in the Turing and newer architecture) so consistent results are expected.

Have you compared the frames returned by the video reader from the mov and webm compressed files? The different compression options and quality would impact the result of the optical flow.

I think it would be best to ask this question to people directly responsible for the Optical Flow SDK using this forum.

Feb 04 '22 13:02 JanuszL

Hi @JanuszL,

Thank you for your quick reply.

Good to have confirmed that both provide consistent results. I have compared the video frames of various file compression types, and nothing odd occurs in their output, except for some quality differences, yet I'm still surprised by the large differences between optical flow frames. Based on these points, I think I can conclude my issue is not caused by DALI, so I will ask my question on the Optical Flow SDK forum.

Feb 07 '22 10:02 Renzzauw

@Renzzauw,

If you see the same behavior with OpenCV's cuda implementation then the root cause is not related to the way DALI utilizes NVIDIA optical flow API.

Feb 07 '22 11:02 JanuszL