[Air][perf]: Add shape check to reduce the number of calls of pa.concat_arrays.
Why are these changes needed?
Problem
In our application that use map_batches in ray data, the time to transform the blocks is not negatable, In https://github.com/ray-project/ray/issues/42960 and https://github.com/ray-project/ray/issues/42377 then metioned this.
Base on our profiling, the transform the blocks can take up to 3% of the total CPU time and 25% in the worker process. (same test case in https://github.com/ray-project/ray/issues/42377 provided by @stephanie-wang).
The most of the time is spend on _concat_same_type in ray/air/util/tensor_extensions/arrow.py
3% of the total CPU time including all component and that time is not negatable. (Thanks the help and profiling tool provided by @ysfess22 and @Superskyyy)
0.68s of total 2.7s of the default_worker (25%).
Total cpu time compared with others
Improvements
In this PR, we propose to add the shape check before the _concat_same_type is called. The problem is most of time it will only concat the array with shape 1, but it will results in addition cost to call pa.concat_arrays and concat_arrays is not a efficient operation.
After this improvement, the time to transforming blocks is reduced to almost zeros and is not showed in the profiling results
data throughput before
Dataset throughput:
* Ray Data throughput: 38.01358118312467 rows/s
* Estimated single node throughput: 4.779325257480674 rows/s
Total time: 42.2807s
data throughput optimized
Dataset throughput:
* Ray Data throughput: 39.091669623436424 rows/s
* Estimated single node throughput: 4.897569999968348 rows/s
Total time: 41.1154s
Feel free to ask any questions and @nemo9cby @liuxsh9 will also be happy to discuss this issue.
Related issue number
Improve but can't close yet https://github.com/ray-project/ray/issues/42960 https://github.com/ray-project/ray/issues/42377
Checks
- [√] I've signed off every commit(by using the -s flag, i.e.,
git commit -s) in this PR. - [√] I've run
scripts/format.shto lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I added a
method in Tune, I've added it in
doc/source/tune/api/under the corresponding.rstfile.
- [ ] I've added any new APIs to the API Reference. For example, if I added a
method in Tune, I've added it in
- [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
- [√] Unit tests
- [√] Release tests
- [ ] This PR is not tested :(
In my local I can pass the test_block_slicing all the time
pytest test_dynamic_block_split.py::test_block_slicing
=========================================================== test session starts ============================================================
platform linux -- Python 3.10.13, pytest-8.2.0, pluggy-1.5.0
rootdir: /mnt/sda1/zhilong/ray_build/workspace/ray
configfile: pytest.ini
collected 5 items
test_dynamic_block_split.py ..... [100%]
============================================================ 5 passed in 4.72s =============================================================
but I don't know why it failed here.
In my local I can pass the
test_block_slicingall the timepytest test_dynamic_block_split.py::test_block_slicing =========================================================== test session starts ============================================================ platform linux -- Python 3.10.13, pytest-8.2.0, pluggy-1.5.0 rootdir: /mnt/sda1/zhilong/ray_build/workspace/ray configfile: pytest.ini collected 5 items test_dynamic_block_split.py ..... [100%] ============================================================ 5 passed in 4.72s =============================================================but I don't know why it failed here.
I think it's a problem of pyarrow version. It's correct from pyarrow>=7.0.0.
Hey @Bye-legumes, thanks for opening this PR! The change makes sense to me.
I'm not sure if we'll be able to bump the minimum Array version to 7. Would it be difficult to make the test pass with Arrow 6?
Hey @Bye-legumes, thanks for opening this PR! The change makes sense to me.
I'm not sure if we'll be able to bump the minimum Array version to 7. Would it be difficult to make the test pass with Arrow 6?
yeah....I don't know why the result is different on arrow 6...But arrow 7 and above is OK. But currently this is the problem that we call concat too many times.
closing.. a bit too far away from today.
cc @bveeramani