tremor-runtime icon indicating copy to clipboard operation
tremor-runtime copied to clipboard

optimize `where` and `having` for select statments and streams

Open Licenser opened this issue 3 years ago • 0 comments

Describe the problem you are trying to solve

Code like this, where we split the stream into multiple conditions is fairly common:

# this is simplified to illustrate the more complex use case
select event from in where event == "exit" into exit'; # we call this select1
select event from in where event != "exit" into out; # we call this select2

the way streams work the following happens in the runtime:

  1. the event arrives a in
  2. the event gets cloned and scheduled for select1
  3. the event gets scheduled for select2
  4. ...

This means that we clone the event just to have it filtered at the where clause, this is a clone that's avoidable

Describe the solution you'd like

The optimizer should move where clauses "left" into the prior node to allow filtering of the original event and avoided a clone for events that would be dropped by where.

Notes

This could probably be achieved by introducing an extra node in the tree before the select to combine all where clauses followed by an optimizing step that removes those nodes were not needed. Benchmarking will be required.

Licenser avatar May 09 '22 12:05 Licenser