greptimedb
greptimedb copied to clipboard
Support setting time range in `Copy From` statement
What type of enhancement is this?
User experience
What does the enhancement do?
As title.
Currently there's timestamp_range: Option<TimestampRange>
field in the struct CopyTableRequest
, but it's not set. We can set it from the SQL parser first, then check if it works.
The copy from
in postgresql supports "where condition" (https://www.postgresql.org/docs/current/sql-copy.html), maybe we can do that, too. For this issue, we can extract the time range from the "where condition", and leave more general filtering in the future.
Implementation challenges
No response
hi @MichaelScofield ,i want to take this, would you give me some further advice?
@naosense Sure. The "copy from" statement is parsed in file "copy_parser.rs", you can start from there. First you might want to see how "where" can be parsed in sqlparser. If lucky, the "copy from" statement from sqlparser may already carry the "where" part. Then you need to find how to extract the desired "time range" from the filters in "where". Finally set the time range in the struct CopyTableRequest
, do some test to see if it works!
hi @MichaelScofield ,i want to take this, would you give me some further advice?
I think we can add a where
clause for the Copy From
statement, then forward the filter of the where
clause to the scan request(The Copy From
statement sends a scan request to retrieve data).
After my initial research, I do not have much idea about solving this issue, probably because I am not familiar with the greptimedb and sqlparser. I hope someone can take over, I'm afraid this issue is beyond my ability😅.
I spent quite a while trying to solve this problem. There has been a small progress, but since I am a newbie in rust, I find it difficult for me to solve rust syntax problems in a short time, such as error handling, type conversion, and syntax sugar. So I decided to post my progress here, welcome to discuss ideas.
First I successfully parsed the where clause and got the expression. As a rookie, it took me a while to get if-else to return the same type of result...
let predicate = if self.parser.parse_keyword(Keyword::WHERE) {
Some(self
.parser
.parse_expr()
.context(error::SyntaxSnafu)?;)
} else {
None
};
Then I found that the returned expression did not meet the needs of the issue. What needs to be returned here is timestampRange. So, the next step is to convert the expression in sql-parser into timestampRange. I think what can be done is to:
- check if the type of the expression is AnyOp or AllOp
- check if the data type is timestamp
- handle various
>
,<
situations - ...
@naosense that's ok, feel free to take another good first issue!
@okJiang we can make things a lot simpler by restricting the allowed syntax here. The filters in copy from
stmt only take one of the following 3 forms:
-
where <F> and <F>
-
where <F> or <F>
-
where <F>
where F = <column> > | >= | < | <= | = <timestamp literal value>
I am trying to solve this problem. It seems that I need to pass arguments to fn to_copy_table_request and I can
- covert string to timestamp with timezone
- check the specified columns is correct.
I can't find a elegant way to pass the parsed arugments.
pub type WhereClause = Option<(
Option<(ObjectName, ParserBinaryOperator, String)>,
Option<ParserBinaryOperator>,
Option<(ObjectName, ParserBinaryOperator, String)>,
)>;
Should I simply pass Expr to the to_copy_table_request function and parse it again, or are there some methods to make the WhereClause more elegant
Should I simply pass Expr to the to_copy_table_request function and parse it again, or are there some methods to make the WhereClause more elegant
@kysshsy For simplicity, what about reusing the time range option? https://github.com/GreptimeTeam/greptimedb/blob/a3a2c8d06340a296c72de79016ecbd113d4f8954/src/operator/src/statement.rs#L335
Then we use the time range to filter output record batches from files.
If we'd like to support WHERE
, maybe we should provide full capabilities of the where expression. We could create a new issue for WHERE
support in COPY
statement.
@kysshsy Since we've marked this issue as GFI, I think we can resort to a simple solution. We don't need to support complex time ranges expressed by where
clause, but to reuse the time range in WITH
options, which is already in CopyTableRequest
:
https://github.com/GreptimeTeam/greptimedb/blob/090b59e8d6406397eb1668dc6c2e4c68e914b664/src/table/src/requests.rs#L231
This will solve 90% use cases.
Once we've built a record batch stream via build_read_stream
:
https://github.com/GreptimeTeam/greptimedb/blob/f93b5b19f05bde852b08366f52f0cf30d09c1c96/src/operator/src/statement/copy_table_from.rs#L208
we can evaluate the batches yielded just like here: https://github.com/GreptimeTeam/greptimedb/blob/40c585890a3cb231dacb5b7cdf1496a06e581826/src/operator/src/statement/copy_table_to.rs#L104-L135