Kevin Tse
Kevin Tse
It seems like StreamingASR doesn't work with Python 3.10 due to an issue within PyAudio. Executing `python run_sasr.py` will lead an error `[PY_SSIZE_T_CLEAN macro must be defined for '#' formats]`....
Stack from [ghstack](https://github.com/ezyang/ghstack): * __->__ #734 This PR is primarily focused on adding more datasets for benchmarking. Notable changes that are in progress: - Using `PrototypeMultiprocessingReadingService` as that will become...
Stack from [ghstack](https://github.com/ezyang/ghstack): * __->__ #724 This PR adds RandomSplitter without a buffer. The upside is that this uses less memory (good for memory-bound cases) but the downside are 1)...
Stack from [ghstack](https://github.com/ezyang/ghstack): * __->__ #728 * #746 Adding `__len__` to `DataLoader2`. See inline comments. We should discuss the details and if this makes sense. Fixes #549 Differential Revision: [D38999743](https://our.internmc.facebook.com/intern/diff/D38999743)
Add Examples of Common Preprocessing Steps with IterDataPipe (such as splitting a data set into two)
### 📚 The doc issue There are a few common steps that users often would like to do while preprocessing data, such as [splitting their data set](https://pytorch.org/docs/stable/data.html#torch.utils.data.random_split) into train and...
### 🚀 The feature This issue proposes the addition of a linter for DataPipes and DataLoader2. The linter can analyze the graph of DataPipes and input arguments to DataLoaderV, and...
### 📚 The doc issue As we further develop `DataLoader2`, we should add a separate page under ("API Reference") for `DataLoader2` documentation. The page should cover topics such as adapters,...
### 🚀 The feature ** Note that this is a RFC to solely discuss the design. There is currently no plan to implement this feature. This issue serves as a...
When a new iterator is created, `DataLoader2` currently resumes from when it was left off rather than resetting and starting from the beginning again (see code snippet below). This is...
Currently, `in_batch_shuffle` doesn't use the RNG shared across processes. This can be problematic if it is used before prior to sharding, resulting in certain samples not being used in training...