DALI Add VideoDecoder and VideoDecoderSlice ops

Hi,

Currently I'm working on video tasks. In order to speed up the preprocessing time, I usually load all the encoded videos to the memory or the tfrecords. However, it seems I cannot find the relative ops in Dali. The VideoReader do the similar thing but it has to load the video from disks.

In this cases, I'm wondering is that possible to add the VideoDecoder ops to decode the video bytes in the memory by slightly changing the VideoReader ops. Also, seems crop is also supported for the video sequence, it would also be nice to add the VideoDecoderSlice ops as well.

Thank!

Jun 29 '20 01:06 foreverYoungGitHub

Hi, The VideoReader operator is tightly coupled with reading data from the files, as it allows to generate randomly selected sequence from the random input file. Changing it to support TFRecord required a substantial change in the assumptions and principles of the operation. Still, it is a valid request and we would be more than happy to review a PR implementing it.

Jun 29 '20 11:06 JanuszL

Thanks for your quick responds!

I just go over the code quickly and it seems the cuvideodecoder is isolated with the VideoReader ops (https://github.com/NVIDIA/DALI/blob/de4fb5f5e18e2313ce6aadd3b420c7b3973ce04a/dali/operators/reader/loader/video_loader.cc). The video loader is used to read the file based on some rules and the video decoder trys to decode the video frames. Which should be easy to add another wrapper to create a new ops.

However, when I check the ImageDecoder ops, it is in the decoder folder and use it own host decoder and nvjpeg decoder to decode the images. In this case, if we want to add the video decoder ops to the dali, should we create it in the decoder folder and duplicated nvvideodecoder code again in the decoder folder?

Jun 29 '20 13:06 foreverYoungGitHub

You are right, all code strictly connected with decoding is isolated to nvdecoder folder - this is the nvdec stuff. With that, it is not implemented as full DALI operator. Current design treats is as more of the helper class or implementation detail for the VideoReader. Unlike image examples you mentioned, for video there were some problems and performance considerations about decoupling VideoReader and VideoDecoder, so it stayed fused together. Also this code was modeled after NVVL. In current approach the way to go would be to write another reader for TFRecord and reuse parts of existing VideoReader. Let us discuss internally, if this is a time to revisit this design and get back to you. Also, we have some changes for this part of the code coming in the next few days.

Jun 29 '20 17:06 awolant

With the availability of the video decoder operator you can decode videos loaded from different sources (including TFRecrod entries).

Apr 28 '23 11:04 JanuszL