webdataset
webdataset copied to clipboard
Why is `wds.Processor` not included in the `v2` or `main` branch.
I was going through the documentation and it points to using wds.Processor (here) to add a preprocessing pipeline to the data.
However, in the main branch, this Processor class is mysteriously unavailable. Is that intentional and if it is then what is the workaround to adding preprocessing steps to the data. I need to be able to have access to all the information of the sample for the preprocessing step.
Same for wds.Shorthands and wds.Composable (here)
Sorry, I will have to update the documentation.
The reason it's not included anymore is because the architecture for pipelines has changed to be more in line with torchdata.
I'm not sure what you mean by "having access to all the information"; if you write map(f), the function f gets the complete sample as an argument. Furthermore, you can also write pipeline stages as callables:
def process(source):
for sample in source:
... code goes here ..
ds = WebDataset(...).compose(process)