evadb Introduce `REBATCH` operator for optimization

Introduce `REBATCH` operator for optimization

Open xzdandy opened this issue 1 year ago • 1 comments

Search before asking

[X] I have searched the EvaDB issues and found no similar feature requests.

Description

We experimenting with the github data source in #1233, one side finding is we can not do batching in storage engine, otherwise the LIMIT will not work properly.

More generally, different operators, functions, ray piplines, hardwares can benefit from different batch sizes. So for a more systematic support, we can introduce a REBATCH. In the optimizer we can ingest the REBATCH into the plan as needed.

The REBATCH operator will take in two possible configurations

batch_mem_size: batch based on memory limit
batch_size: direct batch size limit

Looking for feedbacks! Thanks!

Use case

No response

Are you willing to submit a PR?

[ ] Yes I'd like to help by submitting a PR!

Sep 29 '23 19:09 xzdandy

Without batch, the project leads to significant overhead for a simple query as SELECT * FROM sqlite_data.home_sales;. The raw profile can be found at https://github.com/georgia-tech-db/evadb/blob/xzdandy/select_all_home_sales.profile

with_batch

Oct 12 '23 07:10 xzdandy

evadb evadb copied to clipboard

Introduce `REBATCH` operator for optimization

Search before asking

Description

Use case

Are you willing to submit a PR?

evadb
evadb copied to clipboard