Tom

Results 170 comments of Tom

The README notebook is executing correctly with v2.

The caching code has been rewritten and this should not be an issue anymore.

That's correct: "with_epoch" just slices the stream based on items, not samples. You'll have to divide the desired epoch size by the batch size. Is there any reason this might...

Can you attach the output of `tar tvf shard.tar` please? This error is triggered when you have a file that contains repeated file names, something like: ``` dir/base.jpg dir/base.json dir/base.jpg...

Yes, that will trigger this error. File names are supposed to be distinct in WebDataset files; that's just a very useful convention, and it is needed to segment tar files...

I have added a rename_files option that allows you to rename files from the tar file prior to grouping by the grou_by_keys function. There is also a select_files argument that...

File names inside tar files must be unique across .tar files. That's because webdataset considers the entire collection of tar files to be the dataset and requires unique file names...

It's pretty much what it says: the `curl` command retrieved something that isn't a `tar` file. That's probably because you pointed it at the wrong URL. Most web servers will...

Shards are accessed individually. By default, all data is handled streaming, meaning nothing is cached locally. If you use caching, each individual shard is first downloaded completely, then the local...

You can override the url_to_name argument to map shard names to cache file names in any way you like. I will try to improve the default.