ray
ray copied to clipboard
WIP: shuffle operation optimization uses smaller dtype for building i…
Why are these changes needed?
The problem is limited to the sort and random_shuffle functions, which both build a numpy.array
to reorder the rows in a block. Memory overhead exists during index creation if the size of each row is tiny (i.e. not many columns).
Related issue number
Closes #42146
Checks
- [x] I've signed off every commit(by using the -s flag, i.e.,
git commit -s
) in this PR. - [x] I've run
scripts/format.sh
to lint the changes in this PR. - [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
- [x] Unit tests
- [ ] Evaluate and Benchmark Two Approaches
This is a WIP as I have not yet implemented the second approach involving in-memory shuffling, nor performed benchmarking showing the performance difference between these two approaches and the original baseline. I'm just waiting for feedback on #42146.
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
- If you'd like to keep this open, just leave any comment, and the stale label will be removed.