ray_shuffling_data_loader
ray_shuffling_data_loader copied to clipboard
A Ray-based data loader with per-epoch shuffling and configurable pipelining, for shuffling and loading training data for distributed training of machine learning models.
Signed-off-by: Richard Liaw
Add unit tests for the following: 1. Shuffler (shuffle.py) 2. `BatchQueue` (batch_queue.py) 3. `ShufflingDataset` (dataset.py) 4. `TorchShufflingDataset` (torch_dataset.py) ### Implementation Notes - `ShufflingDataset` and `TorchShufflingDataset` testing could initially be as...
Right now, the number of shuffle mappers is equal to the number of input files, with each mapper reading a single input file. However: 1. For very large files, we...
We need to build a connector to TF dataset iterator. Impl idea from @clarkzinzow: We’d take the base shuffling dataset, create a ShufflingTFDataset that converts each batch dataframe to feature...
Add user guide to docs. This should include: - [ ] Walkthrough of installation, setup, and example. - [ ] Cataloguing of all configuration details, e.g. what to set `num_reducers`...