webdataset icon indicating copy to clipboard operation
webdataset copied to clipboard

A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.

Results 185 webdataset issues
Sort by recently updated
recently updated
newest added

It appears that `PIL` was not imported correctly in the `imageencoder`. Currently, it import `PIL` via `import PIL` directly and refer to `Image` via `PIL.Image` which cause the no attribute...

bug

I was going through the documentation and it points to using `wds.Processor` ([here](https://webdataset.github.io/webdataset/howitworks/)) to add a preprocessing pipeline to the data. However, in the `main` branch, this `Processor` class is...

documentation

Hi, I'm trying to subsample a large dataset (LAION 400M) represented in the web dataset format and was hoping for some pointers. Just based on the documentation I am unsure...

documentation

When using `DDP` during training, the documentation says to use the method `ddp_equalize` to make sure each worker process the same number of batches [[doc](https://webdataset.github.io/webdataset/multinode/#distributeddataparallel)]. However when running this: ```python...

documentation

I use webdataset in SpeechBrain. And I found there are 2 cases the process will hang in DDP training: 1. `num_workers` is set too large, set it to 1 is...

bug

I am trying to implement a new dataloader with an s3 tar file being loaded. This tar file has .npz arrays and .cls with a class. .cls files load just...

bug

I found that there are no release notes for each released version. It is hard for users/developers to know what happened between two different versions. Thanks a lot!

enhancement

I've been trying to debug and resolve a number of distributed training shuffle issues recently, I've found some alarming issues... 1. There is no way to have a reliable epoch...

enhancement

Hi, I'm currently using Webdataset to stream data from S3 for my multi-gpu Pytorch training job (using multiprocessing Dataloader workers for each gpu like typical Pytorch jobs, etc.). For some...

bug