Unify workers
DownloadWorker, SubsetWorker, OpticalFlowWorker are all diff parameterizations of the same "worker" class we need to make.
Some important things: Diff between Download and Subset worker is input format i.e. if input_format=="webdataset" then Subset else Download Diff between Download and OpticalFlow is output - what do you write? In DownloadWorker the output is all the streams, in OpticalFlowWorker output is metadata. This needs to be added:
TODO:
- [ ] decide between Subset and Download based on input format and unify them
- [ ] decide what to write based on param (chosen metadata or all streams)
try this - https://github.com/iejMac/video2dataset/blob/28ab1e5052d77a7d26e979cc7e1f181a714659b7/video2dataset/workers/download_worker.py#L149
with this - https://github.com/iejMac/video2dataset/blob/28ab1e5052d77a7d26e979cc7e1f181a714659b7/video2dataset/workers/subset_worker.py#L133
and that already makes things way more similar
and then maybe we need something that wraps dataloader so it returns things the same way as data_reader? or the other way around? I feel like dict should be the return type