webdataset icon indicating copy to clipboard operation
webdataset copied to clipboard

Ability to use globbing

Open nils-werner opened this issue 3 years ago • 1 comments

When I am reading my datasets from disk I have access to all shard names and do not need to know them all in advance. In this case it would be great if I could use globbing, i.e.

dataset-*.tar

instead of

dataset-{000000..000010}.tar

nils-werner avatar Feb 22 '22 15:02 nils-werner

Instead of a string, you can pass a list, so you can just write:

shards = list(glob(pattern))
WebDataset(shards)

(TODO: add support for iterables in addition to lists)

tmbdev avatar Feb 25 '22 17:02 tmbdev