questdb icon indicating copy to clipboard operation
questdb copied to clipboard

Better handling of intervals in queries

Open mounte opened this issue 1 year ago • 3 comments

Is your feature request related to a problem?

We found that posing queries having a where statement looking something like: ... WHERE (timestamp BETWEEN a AND b) OR (timestamp BETWEEN c AND d) ... resulted in row scans over the whole dataset which lead to suboptimal performance.

Currently what we did is we did a query of a single timestamp range and essentially pre-processing so we grabbed the min( start_timestamps ) and the max(end_timestamps) then after receiving the dataset we split it up on our own. Even though we processed and transferred more data than necessary it was faster than the full record scan.

Describe the solution you'd like.

Since queries like: WHERE (timestamp BETWEEN a AND b) AND (timestamp BETWEEN c AND d) ... i.e. bolean AND operator instead of OR resulted in a query plan using interval forward scan and by the looks of it the intervals could be an array of multiple intervals.

My direct idea would be to aggregate the given intervals from the query and do a pre-processing step to determine the k number of sorted and non-overlaping intervals based of the query, then use this list of non-overlapping queries as input to the interval forward scan.

Describe alternatives you've considered.

As an alternative you could in the query engine do something similar to what we resorted to, generate the interval from min(start_timestamps) to max(end_timestamps) and use interval scan on that interval combined with filter on timestamp.

Full Name:

Daniel Wedlund

Affiliation:

Mounte AB

Additional context

Had a quick chat on slack regarding this.

mounte avatar Jan 23 '24 13:01 mounte

This could probably be tagged with performance also/instead.

mounte avatar Jan 23 '24 13:01 mounte

Did the slack chat lead to some suggestions for a solution, or a deeper insight? Can you share more details?

Gogis0 avatar Feb 08 '24 19:02 Gogis0

@ideoma stated that "OR is not well optimized in timestamp range filter. Better to avoid if possible" @bziobrowski suggested investigating with explain, and the outcome was condensed in this ticket more or less.

In the end we went with a solution where we simply selected everything from min(all timestamps) to max(all timestamps) of our segments and then we did some post-processing on our end... not super elegant but for us it was faster to transfer more data than necessary.

mounte avatar Feb 13 '24 17:02 mounte