DALI icon indicating copy to clipboard operation
DALI copied to clipboard

Dealing with corrupt videos using experimental video decoder

Open tomresan opened this issue 1 year ago • 7 comments

Version

1.35

Describe the bug.

I am using fn.experimental.decoders.video to decode videos stored in a web dataset. However, there exist files in my dataset that are corrupt and/or can't be openend by DALI. However, instead of throwing an error the entire process halts with a segmentation fault error when the decoder sees a corrupt video:

[mov,mp4,m4a,3gp,3g2,mj2 @ 0x7f0ccc298d00] moov atom not found
[/opt/dali/dali/operators/reader/loader/video/frames_decoder.cc:237] Failed to open video file memory filedue to Invalid data found when processing input
Segmentation fault (core dumped)

Essentially, this issue is similar to #5155, but for the experimental decoder instead of the video reader.

Minimum reproducible example

No response

Relevant log output

No response

Other/Misc.

No response

Check for duplicates

  • [X] I have searched the open bugs/issues and have found no duplicates for this bug report

tomresan avatar May 28 '24 20:05 tomresan

Hi @Tomsen1410,

Thank you for reporting this. Can you tell if the videos are indeed corrupted by opening them in FFmpeg or this is just a DALI behavior? DALI operators work in the push mode, processing the whole batch at the time. So when DALi fails to process a given sample in the batch it cannot ask for more to replace the faulty one, so it throws an error. The only solution that comes to my mind is to provide an empty sample or zeroed one (as some operators may not handle empty tensors gracefully).

JanuszL avatar May 28 '24 20:05 JanuszL

Could you provide the ffmpeg command I should test on the video?

tomresan avatar May 28 '24 20:05 tomresan

You can check this thread and see if FFmpeg can decode and save frames to a file.

JanuszL avatar May 28 '24 20:05 JanuszL

Ok, I have ran ffmpeg on the corrupted file and it throws the same error:

[mov,mp4,m4a,3gp,3g2,mj2 @ 0x559fa2226100] moov atom not found
[in#0 @ 0x559fa2225fc0] Error opening input: Invalid data found when processing input
Error opening input file /path/to/file.mp4.
Error opening input files: Invalid data found when processing input

I am using ffmpeg 6.1.1 installed from the conda-forge channel.

You can find the corrupted file attached.

https://github.com/NVIDIA/DALI/assets/15103267/f4d3216d-e825-49dc-975f-472d44dff41b

tomresan avatar May 28 '24 22:05 tomresan

If FFmpeg cannot handle the video correctly I don't think we can do more than that. As you are using webdataset, you can manually edit the index file generated by wds2idx.py script to skip the mentioned sample. I also noticed that DALI doesn't provide a meaningful error message (ad crashes instead of raising an expectation) when it encounters a faulty file. Can you recheck the DALI nightly build once https://github.com/NVIDIA/DALI/pull/5491 is merged, check the offset to the faulty sample in the webdataset, and adjust the index file?

JanuszL avatar May 29 '24 10:05 JanuszL

Yes, that is exactly the Problem. I have no way of catching the error and the entire training process stops.

I will check, once it is merged. What exactly do you mean by adjusting the index file? When the decoder throws proper errors there is no need to alter the index file anymore, no?

tomresan avatar May 29 '24 12:05 tomresan

@Tomsen1410 - the https://github.com/NVIDIA/DALI/pull/5491 has been merged. Please check the next nightly build to see if that helps.

JanuszL avatar Jun 03 '24 11:06 JanuszL

1.39 and 1.40 are public. Closing.

JanuszL avatar Aug 05 '24 11:08 JanuszL