webdataset icon indicating copy to clipboard operation
webdataset copied to clipboard

A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.

Results 185 webdataset issues
Sort by recently updated
recently updated
newest added

I'd tried to stream `.tar` and `.tar.gz` files from S3 compatible storages and it seems like due to a slightly slow connection or something, we might read and get the...

Hey, amazing work!! I was looking at LAION-400 dataset and it mentions that there are precomputed CLIP embeddings of 1TB available. However, I cannot find a way to just download...

Hi @tmbdev! Any chance we could get a release of [v0.2.107](https://github.com/webdataset/webdataset/releases/tag/v0.2.107) on PyPI? The [most recent PyPI release](https://pypi.org/project/webdataset/#history) is v0.2.100, which is missing a couple very useful fixes (including https://github.com/webdataset/webdataset/pull/394).

bug

### Description There's an inconsistency in how filenames are filtered between `IndexTarSamples` and `MMIndexedTar` classes that can lead to index misalignment and potential data corruption. ### Current Behavior 1. `IndexTarSamples`...

bug
enhancement

I would like to seek help on how to load a WebDataset containing embeddings generated from prompts through a text_encoder for multi-GPU training. If anyone could help, I would be...