webdataset icon indicating copy to clipboard operation
webdataset copied to clipboard

A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.

Results 185 webdataset issues
Sort by recently updated
recently updated
newest added

Hey! Thanks for your great work on this library. I encountered very weird training behaviour with wds. I generated my tar files all in one folder as follows: ``` training_faa0dfb1-9775-4279-8146-f251997958c4.tar...

documentation
errormessage

Hi, I'm confused about the new readme "complete pipeline" example. Why does it do `dataset.batched(16)`, then `wds.WebLoader(..., batch_size=8)`, then `.unbatched()`, then `.batched(12)`? It says, "batch in the dataset and then...

I've been working with 0.2.x recently updating a project that was using 0.1 and adding support to another, I have a rough template for a custom pipeline but there are...

Hi I see that the documentation for multinode still shows that we need to set `nodesplitter` but that's no longer an argument for `WebDataset`. Is there any reason why this...

Currently it is impossible to re-create bit-exact Webdatasets, as each file in the Tar archive has a different mtime. This has slightly annoying implications for file caching and versioning, as...

# ISSUE Build fails # CAUSE Missing line in requirements.txt https://github.com/webdataset/webdataset/blob/main/webdataset/shardlists.py#L14 # FIX Adding the missing requirement

Please have a look at https://github.com/webdataset/webdataset/blob/682b30ee484d719a954554654d2d6baa213f9371/webdataset/compat.py#L96-L108 When input `urls` is string like `data-{000..123).tar`, it seems the wds just append both nodesplitter and workersplitter twice, which results the yield data is...

add testcase

I've been trying to duplicate the compose implementation given in the documentation, but copying the source gives me the following error. > import matplotlib.pyplot as plt > import torch.utils.data >...

documentation

Hi, Thanks for this great project. I have a few questions on using Webdataset for tensorflow. I referred to this [github repository](https://github.com/webdataset/webdataset-tensorflow/blob/main/resnet-multi.py) to set up the data and model trainer...

bug