Tom

Results 170 comments of Tom

If no caching is enabled, then the shards are not downloaded before processing. Also, I don't see where HEAD requests would come from without caching. Are you sure you haven't...

I don't understand the bug report. You say that "Tar downloads of all files (takes 93 seconds)" and "Batch 1 completes (613 seconds after tar download complete)". But WebDataset doesn't...

Thanks for the report. Sorry for the late response: if you have a PR, I'll be happy to include it, otherwise, I'll look into this.

Until this gets fixed, you can simply use the Pipeline interface for opening the URL.

You can find a DDP training example here: https://github.com/webdataset/webdataset-imagenet At this point, there are several ways of dealing with DDP training: (1) use node splitting and Join (this gives you...

> If resample=True will lead to the different device get the same data? It should. The RNGs are initialized differently on each host and worker. > Besides, does the with_epoch(1000)...

@jrcavani Resampling is a typical strategy in statistics to generate slight variations of the dataset. It is used for various statistical estimators. As such, you can view it as a...

@laolongboy > Same question. How to set the dataloader's num_worker to get the correct num of batches for each epoch? Short answer: use `.with_epoch` on WebLoader, not WebDataset. Note that...

@HuangChiEn The short answer is: use `wids` and `ShardListDataset`. It behaves just like other indexed datasets and works exactly like other datasets for distributed training. We implemented this because distributed...

Sorry, I need to update that worksheet. In v2, ShardList is now called SimpleShardList, and splitter is now called nodesplitter.