video2dataset icon indicating copy to clipboard operation
video2dataset copied to clipboard

Unify workers

Open iejMac opened this issue 2 years ago • 2 comments

DownloadWorker, SubsetWorker, OpticalFlowWorker are all diff parameterizations of the same "worker" class we need to make.

Some important things: Diff between Download and Subset worker is input format i.e. if input_format=="webdataset" then Subset else Download Diff between Download and OpticalFlow is output - what do you write? In DownloadWorker the output is all the streams, in OpticalFlowWorker output is metadata. This needs to be added:

TODO:

  • [ ] decide between Subset and Download based on input format and unify them
  • [ ] decide what to write based on param (chosen metadata or all streams)

iejMac avatar May 30 '23 14:05 iejMac

try this - https://github.com/iejMac/video2dataset/blob/28ab1e5052d77a7d26e979cc7e1f181a714659b7/video2dataset/workers/download_worker.py#L149

with this - https://github.com/iejMac/video2dataset/blob/28ab1e5052d77a7d26e979cc7e1f181a714659b7/video2dataset/workers/subset_worker.py#L133

and that already makes things way more similar

iejMac avatar Jun 05 '23 01:06 iejMac

and then maybe we need something that wraps dataloader so it returns things the same way as data_reader? or the other way around? I feel like dict should be the return type

iejMac avatar Jun 05 '23 01:06 iejMac