webdataset icon indicating copy to clipboard operation
webdataset copied to clipboard

A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.

Results 185 webdataset issues
Sort by recently updated
recently updated
newest added

Hi, May I ask if there is a way to set the seed for the shuffle function for the reproducibility consideration? Thanks!

add testcase

I actually wrote a gist implementing this: https://gist.github.com/harpone/3b6003c22295a50cbd3d2cfc566dc115 Uses the torch-xla distributed MpDeviceLoader with shard splitting across accelerators and workers, with checks that all the minibatches are indeed unique. Just...

documentation

First of all, thanks for the amazing lib! I hope it makes it into PyTorch core soon. I see that the imagenet example performs no casting to long in order...

enhancement

When using webdataset with pytorch-lightning, I discovered that if I pass dataloaders to pytorch-lightning as instances of MultiDataset, training will stall on epoch 0. Once I changed the dataloaders to...

enhancement

Shard writer seems to not work with a gcloud url. However setting stream to self.fname at this line seems to solve the problem. https://github.com/webdataset/webdataset/blob/main/webdataset/writer.py#L406 Is there a reason the file...

enhancement
faq

1. Add more extensions supported by Pillow 2. Fix the test error in test_gopen

I'm trying to use webdataset on a CI but it fails when using webdataset caching. To reproduce use the following dockerfile: ```docker FROM ubuntu:20.04 ENV LANG=C.UTF-8 RUN apt-get update &&...

errormessage

Hi, I'm using webdataset with S3 with multiple shards. I'm using automatic sharding to avoid download the data more than once. It's not clear from the docs if webdataset downloads...

documentation

``` Original Traceback (most recent call last): File "/home/dome/.local/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop data = fetcher.fetch(index) File "/home/dome/.local/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 32, in fetch data.append(next(self.dataset_iter)) File "/home/dome/.local/lib/python3.7/site-packages/webdataset/pipeline.py", line 68, in iterator for...

bug

- Problem I try transforming the dataset [ImageNet-C](https://drive.google.com/drive/folders/1HDVw6CmX3HiG0ODFtI75iIfBDxSiSz2K) (an image classification dataset) into webdataset tarfile formats. The original dataset includes 4 tar files that store image samples. The size of...

bug