DALI Indexing video with binary mask

Describe the question.

Is it possible to index data nodes using a binary mask within a pipeline?

More precisely, I want to remove possible black bars on top and at the bottom of videos. For that, I compute all rows in a video with all-blac pixels and get a binary mask:

mask = fn.reductions.sum(video, axes=[0, 2, 3]) > 0

How can I now use the mask to crop the video accordingly?

Check for duplicates

[X] I have searched the open bugs/issues and have found no duplicates for this bug report

Mar 15 '24 00:03 tomresan

Hi @Tomsen1410,

I think you can try something like this:

    mask = fn.reductions.sum(video, axes=[0, 2, 3]) > 0
    shape = fn.reductions.sum(fn.cast(mask, dtype=types.INT32), dtype=types.INT32)
    mask_shifted = fn.slice(fn.cast(mask, dtype=types.UINT8), 0, shape, axes=[0]) == 0
    anchor = fn.reductions.sum(fn.cast(mask_shifted, dtype=types.INT32), dtype=types.INT32)
    video_trim = fn.slice(video, anchor, shape, axes=[1])

as long as the content is bigger than the black bars. You can also try nonsilent_region operation that was designed for the audio signal in mind but should work in your case as well

    anchor, shape = fn.nonsilent_region(fn.cast(mask, dtype=types.UINT8), reset_interval=1, window_length=1)

It would be best for you to check which one performs better with your data.

Mar 15 '24 13:03 JanuszL

Closing this issue. If there is anything else we can help with please reopen this one or create a new one.

Sep 04 '24 18:09 awolant