evadb
evadb copied to clipboard
Introduce `REBATCH` operator for optimization
Search before asking
- [X] I have searched the EvaDB issues and found no similar feature requests.
Description
We experimenting with the github data source in #1233, one side finding is we can not do batching in storage engine, otherwise the LIMIT
will not work properly.
More generally, different operators, functions, ray piplines, hardwares can benefit from different batch sizes. So for a more systematic support, we can introduce a REBATCH
. In the optimizer we can ingest the REBATCH
into the plan as needed.
The REBATCH
operator will take in two possible configurations
- batch_mem_size: batch based on memory limit
- batch_size: direct batch size limit
Looking for feedbacks! Thanks!
Use case
No response
Are you willing to submit a PR?
- [ ] Yes I'd like to help by submitting a PR!
Without batch, the project leads to significant overhead for a simple query as SELECT * FROM sqlite_data.home_sales;
. The raw profile can be found at https://github.com/georgia-tech-db/evadb/blob/xzdandy/select_all_home_sales.profile