arrow icon indicating copy to clipboard operation
arrow copied to clipboard

GH-32107: [C++] Create Filter Kernels for REE values

Open felipecrv opened this issue 2 years ago • 6 comments

Rationale for this change

Run-end encoded arrays can very compactly represent filters with repeated values and run-end encoded arrays can be filtered efficiently without inflating all the runs for filtering. This PR implements algorithms that make it possible to leverage these possibilities.

What changes are included in this PR?

  • [x] REE x REE filters kernels (for the same types supported in run_end_encode) Closes #13657
  • [x] Plain values x REE filter
  • [x] REE values x Plain filter (boolean bitmap)

Are these changes tested?

By unit tests that handle all the REE/Plain combinations.

Are there any user-facing changes?

Users don't control which filter kernels get picked directly, so there isn't a user-facing API area yet.

  • Closes: #32107

felipecrv avatar Apr 10 '23 13:04 felipecrv

  • Closes: #32107

github-actions[bot] avatar Apr 10 '23 13:04 github-actions[bot]

:warning: GitHub issue #32107 has been automatically assigned in GitHub to PR creator.

github-actions[bot] avatar Apr 10 '23 13:04 github-actions[bot]

:warning: GitHub issue #32107 has been automatically assigned in GitHub to PR creator.

github-actions[bot] avatar Apr 11 '23 22:04 github-actions[bot]

:warning: GitHub issue #32107 has been automatically assigned in GitHub to PR creator.

github-actions[bot] avatar May 13 '23 15:05 github-actions[bot]

@felipecrv do you have any anecdotal benchmark numbers?

alippai avatar May 13 '23 20:05 alippai

@felipecrv do you have any anecdotal benchmark numbers?

@alippai For PlainxREE, only the intuition that for a sufficiently run-end-compressed filter it will be faster than scanning a full bitmap.

But I will be writing benchmarks for the 3 combinations when I complete the unit tests.

felipecrv avatar May 15 '23 15:05 felipecrv