ray_shuffling_data_loader icon indicating copy to clipboard operation
ray_shuffling_data_loader copied to clipboard

A Ray-based data loader with per-epoch shuffling and configurable pipelining, for shuffling and loading training data for distributed training of machine learning models.

Results 8 ray_shuffling_data_loader issues
Sort by recently updated
recently updated
newest added

Add unit tests for the following: 1. Shuffler (shuffle.py) 2. `BatchQueue` (batch_queue.py) 3. `ShufflingDataset` (dataset.py) 4. `TorchShufflingDataset` (torch_dataset.py) ### Implementation Notes - `ShufflingDataset` and `TorchShufflingDataset` testing could initially be as...

Add a contributing README to repo.

documentation

Right now, the number of shuffle mappers is equal to the number of input files, with each mapper reading a single input file. However: 1. For very large files, we...

enhancement

We need to build a connector to TF dataset iterator. Impl idea from @clarkzinzow: We’d take the base shuffling dataset, create a ShufflingTFDataset that converts each batch dataframe to feature...

good first issue

Add user guide to docs. This should include: - [ ] Walkthrough of installation, setup, and example. - [ ] Cataloguing of all configuration details, e.g. what to set `num_reducers`...

documentation