evadb icon indicating copy to clipboard operation
evadb copied to clipboard

Introduce `REBATCH` operator for optimization

Open xzdandy opened this issue 1 year ago • 1 comments

Search before asking

  • [X] I have searched the EvaDB issues and found no similar feature requests.

Description

We experimenting with the github data source in #1233, one side finding is we can not do batching in storage engine, otherwise the LIMIT will not work properly.

More generally, different operators, functions, ray piplines, hardwares can benefit from different batch sizes. So for a more systematic support, we can introduce a REBATCH. In the optimizer we can ingest the REBATCH into the plan as needed.

The REBATCH operator will take in two possible configurations

  • batch_mem_size: batch based on memory limit
  • batch_size: direct batch size limit

Looking for feedbacks! Thanks!

Use case

No response

Are you willing to submit a PR?

  • [ ] Yes I'd like to help by submitting a PR!

xzdandy avatar Sep 29 '23 19:09 xzdandy

Without batch, the project leads to significant overhead for a simple query as SELECT * FROM sqlite_data.home_sales;. The raw profile can be found at https://github.com/georgia-tech-db/evadb/blob/xzdandy/select_all_home_sales.profile

with_batch

xzdandy avatar Oct 12 '23 07:10 xzdandy