webdataset
webdataset copied to clipboard
Ability to use globbing
When I am reading my datasets from disk I have access to all shard names and do not need to know them all in advance. In this case it would be great if I could use globbing, i.e.
dataset-*.tar
instead of
dataset-{000000..000010}.tar
Instead of a string, you can pass a list, so you can just write:
shards = list(glob(pattern))
WebDataset(shards)
(TODO: add support for iterables in addition to lists)