ray [Air][perf]: Add shape check to reduce the number of calls of pa.concat

Why are these changes needed?

Problem

In our application that use map_batches in ray data, the time to transform the blocks is not negatable, In https://github.com/ray-project/ray/issues/42960 and https://github.com/ray-project/ray/issues/42377 then metioned this. Base on our profiling, the transform the blocks can take up to 3% of the total CPU time and 25% in the worker process. (same test case in https://github.com/ray-project/ray/issues/42377 provided by @stephanie-wang). The most of the time is spend on _concat_same_type in ray/air/util/tensor_extensions/arrow.py

3% of the total CPU time including all component and that time is not negatable. (Thanks the help and profiling tool provided by @ysfess22 and @Superskyyy) Screenshot

0.68s of total 2.7s of the default_worker (25%). Screenshot

Total cpu time compared with others

Improvements

In this PR, we propose to add the shape check before the _concat_same_type is called. The problem is most of time it will only concat the array with shape 1, but it will results in addition cost to call pa.concat_arrays and concat_arrays is not a efficient operation.

After this improvement, the time to transforming blocks is reduced to almost zeros and is not showed in the profiling results

Screenshot

data throughput before

Dataset throughput:
        * Ray Data throughput: 38.01358118312467 rows/s
        * Estimated single node throughput: 4.779325257480674 rows/s

Total time: 42.2807s

data throughput optimized

Dataset throughput:
        * Ray Data throughput: 39.091669623436424 rows/s
        * Estimated single node throughput: 4.897569999968348 rows/s

Total time: 41.1154s

Feel free to ask any questions and @nemo9cby @liuxsh9 will also be happy to discuss this issue.

Related issue number

Improve but can't close yet https://github.com/ray-project/ray/issues/42960 https://github.com/ray-project/ray/issues/42377

Checks

[√] I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
[√] I've run scripts/format.sh to lint the changes in this PR.
[ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in doc/source/tune/api/ under the corresponding .rst file.
[ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- [√] Unit tests
- [√] Release tests
- [ ] This PR is not tested :(

May 01 '24 18:05 Bye-legumes

In my local I can pass the test_block_slicing all the time

 pytest test_dynamic_block_split.py::test_block_slicing
=========================================================== test session starts ============================================================
platform linux -- Python 3.10.13, pytest-8.2.0, pluggy-1.5.0
rootdir: /mnt/sda1/zhilong/ray_build/workspace/ray
configfile: pytest.ini
collected 5 items

test_dynamic_block_split.py .....                                                                                                    [100%]

============================================================ 5 passed in 4.72s =============================================================

but I don't know why it failed here.

May 23 '24 20:05 Bye-legumes

In my local I can pass the test_block_slicing all the time

 pytest test_dynamic_block_split.py::test_block_slicing
=========================================================== test session starts ============================================================
platform linux -- Python 3.10.13, pytest-8.2.0, pluggy-1.5.0
rootdir: /mnt/sda1/zhilong/ray_build/workspace/ray
configfile: pytest.ini
collected 5 items

test_dynamic_block_split.py .....                                                                                                    [100%]

============================================================ 5 passed in 4.72s =============================================================

but I don't know why it failed here.

I think it's a problem of pyarrow version. It's correct from pyarrow>=7.0.0.

May 24 '24 20:05 Bye-legumes

Hey @Bye-legumes, thanks for opening this PR! The change makes sense to me.

I'm not sure if we'll be able to bump the minimum Array version to 7. Would it be difficult to make the test pass with Arrow 6?

May 30 '24 23:05 bveeramani

Hey @Bye-legumes, thanks for opening this PR! The change makes sense to me.

I'm not sure if we'll be able to bump the minimum Array version to 7. Would it be difficult to make the test pass with Arrow 6?

yeah....I don't know why the result is different on arrow 6...But arrow 7 and above is OK. But currently this is the problem that we call concat too many times.

May 31 '24 03:05 Bye-legumes

closing.. a bit too far away from today.

cc @bveeramani

Mar 26 '25 23:03 aslonnie

[Air][perf]: Add shape check to reduce the number of calls of pa.concat_arrays.

Why are these changes needed?

Problem

Improvements

Related issue number

Checks