DALI icon indicating copy to clipboard operation
DALI copied to clipboard

Questions regarding design choices

Open tomresan opened this issue 1 year ago • 1 comments

Describe the question.

Hello everyone,

I have a question regarding some design choices when building a video dataset with DALI. My pipeline consists of several steps where some steps happen within DALI pipelines and some steps are normal python code. Specifically, I have a web dataset consisting of video containing tar files, so my first step is to invoke DALI's webdataset reader within a pipeline. Afterwards, I would like to filter out unwanted video files before decoding based on their metadata. Afterwards I invoke a second DALI pipeline for decoding the video files. Then, I process the decoded videos (e.g. cutting them up into smaller snippets and finally forward those to another DALI processing pipeline (e.g., for resizing etc). A dummy code looks something like this:

@pipeline_def()
def wds_extraction(paths):
    raw_video_bytes = fn.readers.webdataset(paths=paths, ...)
    return raw_video_bytes

def filter(source):
    for video_bytes in source:
        duration, fps = get_metadata(video_bytes)
        ...
        yield video_bytes, duration, fps

@pipeline_def()
def decoding(source, device):
    inputs = fn.external_source(source, num_outputs=3) # bytes, duration, fps
    video = fn.experimental.decoders.video(inputs [0], device=device)
    return video, *inputs[1:] # simply forward duration and fps unchanged ...

def cutting_snippets(source):
    ...

@pipeline_def()
def resizing(source):
    fn.external_source(source, ...)
    ...

def iterator(paths):
    source = wds_extraction_iter(paths) # wraps the wds_extraction pipeline in a DALIRaggedIterator
    source = filter(source)
    source = decoding_iter(source) # wraps the decoding pipeline in a DALIRaggedIterator
    source = cutting_snippets(source)
    source = resizing_iter(source) # wraps the resizing pipeline in a DALIRaggedIterator
    yield from source

I wanted to ask whether this design choice is efficient even with the context switches between pure python and DALI pipelines. Are there some disadvantages performance-wise? Another quite bothering thing is that I have to forward each piece of data through every DALI pipeline even though they do not get updated anymore. For example, I extract the duration and fps of each video in the filter method and want to forward them until the end to the user. Hence, I must also load them into the DALI pipelines and simply output them again.

Is there a better way to achieve a pipeline like this?

Check for duplicates

  • [x] I have searched the open bugs/issues and have found no duplicates for this bug report

tomresan avatar Sep 06 '24 19:09 tomresan

Hi @treasan,

Thank you for reaching out. Your design is overall not that bad. The improvement I can think of is using the parallel external sources to asynchronously load and filter videos before the decoding. Regarding the metadata you are passing through pipelines, my impression is that they are not that heavy and the overhead will be small.

mdabek-nvidia avatar Sep 11 '24 08:09 mdabek-nvidia