arrow icon indicating copy to clipboard operation
arrow copied to clipboard

GH-39565: [C++] Do not concatenate ChunkedArray when running take function

Open amol- opened this issue 1 year ago • 5 comments

Rationale for this change

We can avoid extra unecessary work and memory consumption of concatenating chunks when running take, we can directly run the take on the chunks at the only cost of remapping the indices which are usually much fewer than the size of the array we are applying take on.

Are these changes tested?

Two tests already existed that verify take on ChunkedArray and they covered the corner cases well, the only tweak necessary to those tests was that now take returns a chunkedarray made of multiple chunks instead of a single one.

  • Closes: #39565

amol- avatar Jan 11 '24 14:01 amol-

:warning: GitHub issue #39565 has been automatically assigned in GitHub to PR creator.

github-actions[bot] avatar Jan 11 '24 14:01 github-actions[bot]

@felipecrv @amol- Should this PR be kept open now that #40206 was merged?

pitrou avatar Feb 28 '24 15:02 pitrou

@felipecrv @amol- Should this PR be kept open now that #40206 was merged?

I think so, this PR is focused on optimizing TakeCA, while the one that was merged was focused on TakeCC

amol- avatar Feb 28 '24 16:02 amol-

Before my PR: TakeCC made num_chunks Concatenate(chunks) calls. After my PR: TakeCC makes 1 Concatenate(chunks) call.

Next step (and goal of amol's PR/issue pair): 0 concatenations.

felipecrv avatar Feb 28 '24 17:02 felipecrv

I opened #41700 which can handle all the fixed-width types without concatenation.

felipecrv avatar May 17 '24 04:05 felipecrv