webdataset icon indicating copy to clipboard operation
webdataset copied to clipboard

A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.

Results 185 webdataset issues
Sort by recently updated
recently updated
newest added

Hi, This project is so efficient in loading datasets! My main question is how to handle adding metadata to large datasets. Do we need to rebuild the tar each time...

I am trying to create a distributed preprocessing pipeline using tensorcom. I have successfully created it and I can access the data from a different machine. However, I am trying...

documentation

I have recently encountered an error when trying to iterate through a locally-stored dataset. I found this recent [issue ](https://stackoverflow.com/questions/68299665/valueerror-no-gopen-handler-defined/68377432#68377432)on StackOverflow and have posted a temporary solution there. Is there...

documentation

related #21, #46 Hi, I tried to stream images from s3 buckets in PyTorch using this project. But the image that loaded from s3 seems broken. ```python In [0]: url...

documentation

Hi! Firstly, awesome library! :)) When using Webdataset on a large dataset comprised of thousands of shards, I found that there were often multiple temp files for any given shard...

add testcase

Trying to optimize TPU pod slice solution discussed in https://github.com/webdataset/webdataset/issues/47 Assuming scenario where you need to replicate training with epochs i.e., declare length of dataset, distribute equal samples/batches to nodes......

Hi @tmbdev Thanks for sharing the excellent libaray. When I use webdataset to build my dataset, my training program stops at some random iterations. The GPU utility is still high...

documentation

With current docs, I find it a bit difficult to navigate the code base when constructing custom datasets. For instance, it is not clear whether I need to deal with...

enhancement

I am trying to build a `LightningDataModule` using webdataset, however I have encountered some difficulties implementing, since previously I have never used `IterableDataset`. The first thing I tried is [this...

enhancement

I'm trying to use webdataset for a distributed Pytorch XLA POC. I tried implementing the `ResizedDataset` class but start receiving many errors like the following after ~40 training steps. Any...