webdataset
webdataset copied to clipboard
A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.
I'd tried to stream `.tar` and `.tar.gz` files from S3 compatible storages and it seems like due to a slightly slow connection or something, we might read and get the...
Hey, amazing work!! I was looking at LAION-400 dataset and it mentions that there are precomputed CLIP embeddings of 1TB available. However, I cannot find a way to just download...
Hi @tmbdev! Any chance we could get a release of [v0.2.107](https://github.com/webdataset/webdataset/releases/tag/v0.2.107) on PyPI? The [most recent PyPI release](https://pypi.org/project/webdataset/#history) is v0.2.100, which is missing a couple very useful fixes (including https://github.com/webdataset/webdataset/pull/394).
### Description There's an inconsistency in how filenames are filtered between `IndexTarSamples` and `MMIndexedTar` classes that can lead to index misalignment and potential data corruption. ### Current Behavior 1. `IndexTarSamples`...
I would like to seek help on how to load a WebDataset containing embeddings generated from prompts through a text_encoder for multi-GPU training. If anyone could help, I would be...