feat: support `WITHIN` filter
I hereby agree to the terms of the GreptimeDB CLA.
Refer to a related PR or issue link (optional)
Part of #3755.
This PR supports within syntax like:
select * from monitors where ts within '2024';
select * from monitors where ts within '2024-04-19';
select * from monitors where ts within '2024-04-19 23:50';
They are converted to
select * from monitors where ts >= '2024-01-01 00:00:00' and ts < '2025-01-01 00:00:00';
select * from monitors where ts >= '2024-04-19 00:00:00' and ts < '2024-04-20 00:00:00';
select * from monitors where ts >= '2024-04-19 23:50:00' and ts < '2024-04-19 23:51:00';
What's changed and what's your intention?
PR Checklist
Please convert it to a draft if some of the following conditions are not met.
- [ ] I have written the necessary rustdoc comments.
- [ ] I have added the necessary unit tests and integration tests.
- [ ] This PR requires documentation updates.
- [ ] API changes are backward compatible.
- [ ] Schema or data changes are backward compatible.
Summary by CodeRabbit
-
New Features
• Enhanced query filtering now accepts year-only date inputs, automatically converting them into valid date ranges.
• Expanded SQL functionality adds new aggregation and data selection options for analyzing system metrics. -
Chores
• Updated an internal dependency for improved stability. -
Tests
• Added new SQL queries and data insertion tests to validate the improved filtering and aggregation behaviors.
[!IMPORTANT]
Review skipped
Auto reviews are disabled on this repository.
Please check the settings in the CodeRabbit UI or the
.coderabbit.yamlfile in this repository. To trigger a single review, invoke the@coderabbitai reviewcommand.You can disable this status message by setting the
reviews.review_statustofalsein the CodeRabbit configuration file.
Walkthrough
This pull request updates the sqlparser dependency in Cargo.toml and introduces new functionality for filtering operations. A new WithinFilterFunction is added to the math functions, and a within_filter module is integrated into the query engine. The planner now rewrites the logical plan with an asynchronous call to a range plan rewriter, and a new WithinFilterRule is applied to transform filter expressions. Additionally, test cases and SQL scripts are enhanced with new queries and assertions, reflecting these updates.
Changes
| File(s) | Change Summary |
|---|---|
| Cargo.toml | Updated sqlparser dependency from the GreptimeTeam repository/revision to the NiwakaDev repository/revision with unchanged features |
| src/common/function/src/scalars/math.rs | Added new WithinFilterFunction with its structure, trait implementations (name, return_type, signature, eval) for boolean output and placeholder error message |
| src/log-query/src/log_query.rs | Modified test case in canonicalize: changed TimeFilter start date from "2023-10-01" to "2023" |
| src/query/src/{lib.rs, planner.rs, query_engine/state.rs, within_filter.rs} | Introduced within_filter module; added a new RangePlanRewriter step in plan_sql; updated QueryEngineState extension_rules with new WithinFilterRule; added WithinFilterRule implementation that transforms filter expressions |
| tests/cases/standalone/common/{basic.result, basic.sql} | Added new SQL queries for selecting, aggregating, and inserting data on the system_metrics table, including filtering by a within_filter predicate |
Sequence Diagram(s)
sequenceDiagram
participant Planner as DfLogicalPlanner
participant RangeRewriter as RangePlanRewriter
participant FilterRule as WithinFilterRule
participant Optimizer as Query Optimizer
Planner->>Planner: Convert SQL to initial plan (sql_to_rel)
Planner->>RangeRewriter: Await rewrite(original_plan)
RangeRewriter->>FilterRule: Analyze LogicalPlan for within_filter predicate
FilterRule-->>RangeRewriter: Return transformed LogicalPlan
RangeRewriter-->>Planner: Provide rewritten LogicalPlan
Planner->>Optimizer: Optimize LogicalPlan
Optimizer-->>Planner: Return optimized LogicalPlan
Suggested reviewers
- waynexia
- zhongzc
- tisonkun
Poem
I'm just a little rabbit, hopping through the code,
Changes bloom like carrots along my winding road.
With within_filter and plans that twist and turn,
I nibble on new features, ever eager to learn.
Coding carrots and crisp SQL—oh, what a joyful spree!
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?
🪧 Tips
Chat
There are 3 ways to chat with CodeRabbit:
- Review comments: Directly reply to a review comment made by CodeRabbit. Example:
I pushed a fix in commit <commit_id>, please review it.Generate unit testing code for this file.Open a follow-up GitHub issue for this discussion.
- Files and specific lines of code (under the "Files changed" tab): Tag
@coderabbitaiin a new review comment at the desired location with your query. Examples:@coderabbitai generate unit testing code for this file.@coderabbitai modularize this function.
- PR comments: Tag
@coderabbitaiin a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:@coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.@coderabbitai read src/utils.ts and generate unit testing code.@coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.@coderabbitai help me debug CodeRabbit configuration file.
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.
CodeRabbit Commands (Invoked using PR comments)
@coderabbitai pauseto pause the reviews on a PR.@coderabbitai resumeto resume the paused reviews.@coderabbitai reviewto trigger an incremental review. This is useful when automatic reviews are disabled for the repository.@coderabbitai full reviewto do a full review from scratch and review all the files again.@coderabbitai summaryto regenerate the summary of the PR.@coderabbitai generate docstringsto generate docstrings for this PR. (Beta)@coderabbitai resolveresolve all the CodeRabbit review comments.@coderabbitai configurationto show the current CodeRabbit configuration for the repository.@coderabbitai helpto get help.
Other keywords and placeholders
- Add
@coderabbitai ignoreanywhere in the PR description to prevent this PR from being reviewed. - Add
@coderabbitai summaryto generate the high-level summary at a specific location in the PR description. - Add
@coderabbitaianywhere in the PR title to generate the title automatically.
Documentation and Community
- Visit our Documentation for detailed information on how to use CodeRabbit.
- Join our Discord Community to get help, request features, and share feedback.
- Follow us on X/Twitter for updates and announcements.
@killme2008 @waynexia
I've created a rough implementation of the WITHIN filter:
My idea is
- Parse
WITHINfilters, and treats it as functions in the parser phase(https://github.com/NiwakaDev/sqlparser-rs/commit/b3584823b236db3c244b70945008f0aae65c5e39). - Convert
WITHINfilter functions to normal filters in the planner phase.
This idea is based on RANGE query implementation.
If the approach seems good, I'd like to proceed with it. What do you think?
@coderabbitai review
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Fetch PR details
gh pr view 5397 --json files,body,title,commits
# Search for test files related to WITHIN filter
rg -l "WITHIN" --type rust
# Check for documentation updates
fd -e md -e rs | xargs rg -l "WITHIN"
Length of output: 2304
🌐 Web query:
What is WITHIN filter in time series databases?
💡 Result:
While none of the provided sources explicitly mention a "WITHIN" filter operator, time series databases commonly use three equivalent filtering mechanisms that achieve similar spatial/temporal containment checks:
## Temporal Range Filtering
Used to select entries within a specific time window:
```sql
-- RavenDB example with BETWEEN
SELECT timeseries FROM HeartRates
BETWEEN '2020-05-17T00:00:00' AND '2020-05-17T00:10:00' [1]
-- QuestDB's IN syntax for time containment
SELECT * FROM trades
WHERE timestamp IN '2023-12-20;15s' -- 15-second window [5]
```
## Value Range Filtering
Filters numerical values within specified bounds:
```sql
-- InfluxQL example excluding outliers
WHERE (154 - "value") BETWEEN 0 AND 760 [3]
-- RavenDB value threshold filtering
WHERE ts.Value > 75 [1]
```
## Tag Containment Filtering
Checks if tags exist in a specified set:
```csharp
// RavenDB tag whitelisting
.Where(ts => ts.Tag == "watches/fitbit" ||
ts.Tag == "watches/apple") [1]
```
These patterns align with the core filtering capabilities described across sources:
1. Temporal containment through date ranges (`BETWEEN`, `IN`) [1][5]
2. Numerical range checks using comparison operators [1][3]
3. Set membership checks for tags/attributes [1][4]
While terminology varies between implementations (BETWEEN, IN, WHERE clauses), they all enable selecting data points that fall **within** specified temporal, numerical, or categorical boundaries - essentially implementing different flavors of spatial/temporal containment filters.
Citations:
- 1: https://ravendb.net/docs/article-page/6.2/csharp/document-extensions/timeseries/querying/filtering
- 2: https://docs.datomic.com/reference/filters.html
- 3: https://community.grafana.com/t/how-to-filter-noise-from-time-series-graph/108812
- 4: https://www.alibabacloud.com/blog/key-concepts-and-features-of-time-series-databases_594734
- 5: https://questdb.com/glossary/time-series-database/
- 6: https://devops.com/time-series-database-basics/
- 7: https://www.splunk.com/en_us/blog/learn/time-series-databases.html
🏁 Script executed:
#!/bin/bash
# Check the new within_filter implementation
cat src/query/src/within_filter.rs
# Check test cases
cat tests/cases/standalone/common/basic.sql
cat tests/cases/standalone/common/basic.result
# Check planner integration
cat src/query/src/planner.rs
Length of output: 20487
@killme2008 Here's my review of the WITHIN filter implementation:
-
Documentation:
- Please add rustdoc comments for the
WithinFilterRulestruct and its implementation - Document the supported date formats and examples in the comments
- Please add rustdoc comments for the
-
Error Handling:
- Replace
todo!()macro with proper error handling for unsupported cases - Consider adding custom error types for invalid date formats
- Add validation for the timestamp column type
- Replace
-
Date Format Support:
- Current implementation only supports year format (
'2022') - Consider supporting more formats like:
- Full date:
'2022-11-03' - Month:
'2022-11' - Time ranges:
'1h','7d'
- Full date:
- Current implementation only supports year format (
-
Testing:
- Add more test cases covering:
- Different date formats
- Invalid inputs
- Edge cases (year boundaries)
- Error scenarios
- Add more test cases covering:
The approach of converting WITHIN to range filters looks good, but needs more work on documentation, error handling, and format support. Would you like me to provide specific code examples for any of these points?
:white_check_mark: Actions performed
Review triggered.
Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.
@killme2008 @waynexia I've created a rough implementation of the
WITHINfilter:My idea is
- Parse
WITHINfilters, and treats it as functions in the parser phase(NiwakaDev/sqlparser-rs@b358482).- Convert
WITHINfilter functions to normal filters in the planner phase.This idea is based on
RANGEquery implementation. If the approach seems good, I'd like to proceed with it. What do you think?
I think it's great! Let's do it!
@killme2008 Marked this PR as review.
Great work! I plan to review this today
@NiwakaDev Hi, do you have time to resolve these comments? I think this PR is almost done, looks great, thank you.
@killme2008
do you have time to resolve these comments?
Sorry for the delay. I'll address the reviews this weekend.
I have a few concerns about adding a new within grammar. It brings more burden of maintaining the sqlparser lib, and is not generic enough (I did some search and see no one else provides similar things).
To achieve a similar goal, @sunng87 provides another alternative method that implements it as a within() function like where within(ts, '2024') or where within(ts, '2024-04-19 23:50'). This is fully compatible with the current sql grammar and people's usual practice.
Furthermore, we can support relative time filters, like where within(ts, '5d'::INTEVAL), or arbitrary time ranges where within(ts, '2024-02-24', '2024-03-15').
But WITHIN is already a keyword, so we need to pick up another name. E.g., with_in, in_range, time_filter etc. I don't have a good choice in my mind, open to suggestions.
Really sorry to put this comment that late 🙇 It just came across to my mind on the drive
@waynexia
I did some search and see no one else provides similar things
It seems like questdb supports a similar feature: https://questdb.com/docs/reference/sql/where/#time-range-where-in. But it might not be generic.
One thing I’m wondering:
if we implement it as a function, is it pushed down like a normal filter?
I have a few concerns about adding a new
withingrammar. It brings more burden of maintaining the sqlparser lib, and is not generic enough (I did some search and see no one else provides similar things).To achieve a similar goal, @sunng87 provides another alternative method that implements it as a
within()function likewhere within(ts, '2024')orwhere within(ts, '2024-04-19 23:50'). This is fully compatible with the current sql grammar and people's usual practice.Furthermore, we can support relative time filters, like
where within(ts, '5d'::INTEVAL), or arbitrary time rangeswhere within(ts, '2024-02-24', '2024-03-15').But
WITHINis already a keyword, so we need to pick up another name. E.g.,with_in,in_range,time_filteretc. I don't have a good choice in my mind, open to suggestions.Really sorry to put this comment that late 🙇 It just came across to my mind on the drive
If you look at the PR in the parser https://github.com/GreptimeTeam/sqlparser-rs/pull/16,
You will find that it doesn't change the existing SQL parser code but adds new parsing logic. So maintenance isn't a problem here at all.
This grammar is supported by QuestDB. I think it's friendlier than a SQL function. And of course, we can support the interval by extending in option:
select * from monitors where ts within '2024-04-19' in '1 day'
Like that.
if we implement it as a function, is it pushed down like a normal filter?
Sure, we can do this just as other functions or operators
You will find that it doesn't change the existing SQL parser code but adds new parsing logic. So maintenance isn't a problem here at all.
You could have a try yourself before you put "not a problem at all" :) RANGE clause doesn't change existing code either.
I think it's friendlier than a SQL function. And of course, we can support the interval by extending in option
Do you really think the next example is easy to understand?
The range query is worth the maintenance cost due to its usefulness and elegance, but I agree with you that within is unnecessary. @waynexia
Could you refactor this PR, @NiwakaDev? Please implement it as an SQL function. Most of the work is already done. Thanks!
@killme2008 @waynexia
Please implement it as an SQL function. Most of the work is already done. Thanks!
The new following approach:
- Registers
WITHINfunction. - Converts
WITHINfunction to a normal filter in the planner phase.
I guess the second step is almost the same. What do you think?
@killme2008 @waynexia
Please implement it as an SQL function. Most of the work is already done. Thanks!
The new following approach:
- Registers
WITHINfunction.- Converts
WITHINfunction to a normal filter in the planner phase.I guess the second step is almost the same. What do you think?
Yes, definitely.
Thank you @NiwakaDev @killme2008 !
@NiwakaDev Sorry to disturb you, but do you have time to review this PR? It's nearly finished and would benefit our users. If not, I would love to help.
@killme2008
Sorry for the delay. March was busy😅
I can finally start working on this PR this week if you don't mind. If I can't make good progress by next week, I'll ask for your help.
Thank you!
@killme2008
Sorry for the delay. March was busy😅
I can finally start working on this PR this week if you don't mind. If I can't make good progress by next week, I'll ask for your help.
Thank you!
That would be great! Thank you.
Close it due to lack of progress; it can be reopened if there are any updates.