webdataset
webdataset copied to clipboard
A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.
does webdataset lib have doc or web to view api info?
This should be simple, but I can't figure it out. I need to use SFTTrainer with WebDataset. It works fine on a single node, but as soon as I use...
Hi, I hope you are doing well! I am encountering an issue when using WebDataset to load `.pt` files into an ML model. Below is a detailed description of the...
My dataloader is not giving all the samples. Did I do some mistake defining the epoch length? I thought the error could be related with the resample, but in validation...
Hello, thank you for developing this! I am new to webdatasets and I would like to know where I can find documentation on `webdataset` and `wids`. For now, the only...
We are benchmarking WebDataset vs. TFRecord on a tabular problem (the training set is a set of arrays) streaming from S3. We are seeing 2-3X degradation in performance between TFRecord...
Hi, thank you for developing this! I have a directory with protein files where each protein contains shards, this should be the training data for a model. I am using...
I tried the code in https://github.com/webdataset/webdataset/blob/main/examples/column-store.ipynb, but have trouble when i want to use Column Store in multi-node condition. The example said that any shuffling, decoding, etc. needs to happen...
Hi everyone, hope you are doing well wanted to ask a technical question regarding webdataset. I was trying to implement a costum batch sampler function. The issue is the following,...
Hi, does the [`ldmb_cached`](https://github.com/webdataset/webdataset/blob/90346059ec6a64a950c37c252e38db64db00de0b/webdataset/compat.py#L294) method of the fluid interface work when using the [`WebDataset`](https://github.com/webdataset/webdataset/blob/90346059ec6a64a950c37c252e38db64db00de0b/webdataset/compat.py#L332) class and multiple `DataLoader` workers? In the code I see that once the entire dataset is...