arrow icon indicating copy to clipboard operation
arrow copied to clipboard

ARROW-17278: [C++][Benchmarking] Add AsofJoin Ordering Assertion and Benchmark Fixes

Open iChauster opened this issue 2 years ago • 7 comments

  • Add a invalid status if batches are ingested in an incorrect order for any source in AsOfJoin.
  • Fix benchmarks to use isolated memory pools.
  • Create and discard memory pools over different benchmark iterations (if possible) to prevent OOM from creating too many memory pools.

iChauster avatar Aug 01 '22 21:08 iChauster

https://issues.apache.org/jira/browse/ARROW-17278

github-actions[bot] avatar Aug 01 '22 21:08 github-actions[bot]

:warning: Ticket has not been started in JIRA, please click 'Start Progress'.

github-actions[bot] avatar Aug 01 '22 21:08 github-actions[bot]

@westonpace Hi Weston,

I wrote some new tests to validate that our in-order assertions are working (it returns an invalid status which is caught by ErrorIfNotOk) . They seem to detect the out-of-ordered test locally on my computer and it passes, but it seems to fail on the CI checks -- any idea why?

iChauster avatar Aug 02 '22 16:08 iChauster

I think I found the issue, doesn't seem like it passes always?

EDIT: seems like some sort of race condition. We only run the check when are sure the queue has batches in it, however, it seems like in the cases in which the tests fails, it is always empty. The execution context is setup with nullptr as well. Any ideas here @icexelloss ?

iChauster avatar Aug 02 '22 21:08 iChauster

Can you print thread id and see if input received is called by different thread?( for the same input table). Also I think we can check when queue is not empty too, we just need to remember the current timestamp in the input state.

On Tue, Aug 2, 2022 at 5:36 PM Ivan Chau @.***> wrote:

I think I found the issue, doesn't seem like it passes always?

EDIT: seems like some sort of race condition. We only run the check when are sure the queue has batches in it, however, it seems like in the cases in which the tests fails, it is always empty. The execution context is setup with nullptr as well. Any ideas here @icexelloss https://github.com/icexelloss ?

— Reply to this email directly, view it on GitHub https://github.com/apache/arrow/pull/13771#issuecomment-1203227263, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGBXLDCYVT3PCXTQLUBSOLVXGIEZANCNFSM55I6LW6A . You are receiving this because you were mentioned.Message ID: @.***>

icexelloss avatar Aug 02 '22 21:08 icexelloss

Resolved, was checking the last batch in the queue to find latest time, however these batches get consumed and the queue can become empty, instead introduce another instance variable to track latest time seen in the InputState.

iChauster avatar Aug 03 '22 14:08 iChauster

@westonpace - @iChauster has finished his internship last week so I will take this PR over. Let me know if you have more comments.

icexelloss avatar Aug 08 '22 13:08 icexelloss

@westonpace Are there more changes needed for this PR?

icexelloss avatar Aug 18 '22 14:08 icexelloss

@westonpace Anything else we need to do here for merging?

icexelloss avatar Sep 06 '22 13:09 icexelloss

Closing because it has been untouched for a while, in case it's still relevant feel free to reopen and move it forward 👍

amol- avatar Mar 30 '23 17:03 amol-