risingwave
risingwave copied to clipboard
feat(SqlSmith): gen Table Subquery ' EXISTS'
I hereby agree to the terms of the Singularity Data, Inc. Contributor License Agreement.
What's changed and what's your intention?
EXISTS implement
Checklist
- [x] I have written necessary rustdoc comments
- [x] I have added necessary unit tests and integration tests
- [x] All checks passed in
./risedev check
(or alias,./risedev c
)
Refer to a related PR or issue link (optional)
close #4296
There is 1 out of 1024 query that I run have this issue https://github.com/singularity-data/risingwave/issues/4434 . Need to fix this first
Codecov Report
Merging #4431 (1845a80) into main (a07f267) will decrease coverage by
0.02%
. The diff coverage is74.61%
.
@@ Coverage Diff @@
## main #4431 +/- ##
==========================================
- Coverage 73.86% 73.83% -0.03%
==========================================
Files 1002 1004 +2
Lines 162583 162749 +166
==========================================
+ Hits 120088 120166 +78
- Misses 42495 42583 +88
Flag | Coverage Δ | |
---|---|---|
rust | 73.83% <74.61%> (-0.03%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
Impacted Files | Coverage Δ | |
---|---|---|
src/batch/src/executor/group_top_n.rs | 68.42% <ø> (-6.44%) |
:arrow_down: |
...batch/src/executor/join/distributed_lookup_join.rs | 0.00% <0.00%> (ø) |
|
src/batch/src/executor/join/nested_loop_join.rs | 91.73% <ø> (ø) |
|
src/batch/src/executor/join/sort_merge_join.rs | 79.04% <ø> (ø) |
|
src/batch/src/executor/order_by.rs | 95.32% <ø> (ø) |
|
src/batch/src/executor/project_set.rs | 76.15% <ø> (ø) |
|
src/batch/src/executor/sys_row_seq_scan.rs | 0.00% <0.00%> (ø) |
|
src/batch/src/executor/top_n.rs | 75.00% <ø> (ø) |
|
src/batch/src/executor/update.rs | 81.36% <ø> (ø) |
|
.../batch/src/task/consistent_hash_shuffle_channel.rs | 0.00% <0.00%> (ø) |
|
... and 107 more |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
There is 1 out of 1024 query that I run have this issue #4434 . Need to fix this first
alright, I think it is nothing related to exists, we can merge it. Because the problem occur in subquery with aggregate. Probably this probability is very low, until now it is notice.
For some
exists
query, e.g.SELECT t1.* FROM t1 WHERE EXISTS (SELECT * FROM t3 WHERE t1.c1 = t3.c1)
It can be transformed into a hash join, not necessarily a nested loop join.
SELECT t1.* FROM t1, t3 WHERE t1.c1 = t3.c1
So we can probably do more than this.
Clarification: Do you mean that exists can be transformed into something we already test (hash join), so we probably don't need to generate it, there are other more important things to generate?
We still need to generate in these cases as we can test the transformed into something
.
I mean, we can generate correlated subqueries in the exists
clause.
There is a comment in this PR saying:
// TODO: Streaming nested loop join is not implemented yet. // Tracked by: https://github.com/singularity-data/risingwave/issues/2655.`
I thought it tries to say that because of this limitation, there is a set of queries we cannot generate in exists
clause.
So I just try to clarify that there is a subset of queries we still can generate.
~~Any updates?~~ Edit: Zongyu's laptop is broken unfortunately. Resume after new laptop is setup.
Currently a few feature not yet implemented
- Local subquery inside Aggregate
- Correlated subquery in HAVING or SELECT with agg is not implemented
so I decided to workaround, as such If currently the subquery is inside an aggregate function, then it won't generate EXISTS subquery
With this, when I want to generate correlated subquery. The following problem shown up https://github.com/risingwavelabs/risingwave/issues/6350
As for Imatz comment on the possible EXISTS subquery generated by materialised view. I will look into what I can do in a later time.
The problem I facing is "Feature is not yet implemented: correlated subquery in HAVING or SELECT with agg,"
This show that if I would want to generate correlated query (Not the local query), I need to pass the information about "Cannot have agg" all the way until gen_expr which is too much of parameter passing.
I am planning to refactor some code to able to workaround unsupported feature.
- put the two variable 'inside_agg' and 'can_agg' (https://github.com/risingwavelabs/risingwave/blob/a233051741fbd03c68ed1150adf95fcd64961c7d/src/tests/sqlsmith/src/expr.rs#L63) as internal state instead of passing around gen_expr. Meaning that the variable will become global to all the methods (functions).
- Since the state is "global" to the method so I need to ensure correctness, the same way as how the gen_local_query done is, saving all the state and restoring the state upon leaving the function. (This may need some some refactor).
- By doing so, we can actually think of a few more internal state that can help us with workaround for example, if it is mv, then we avoid generating Nested Loop Join.
Previously, I didn't, store it as internal state because the function is keep on being called with in different context so to avoid mistake when I put them in internal state I didn't put them in internal state. But then now I feel it would be better to refactor this part to allow more workaround.
Since this PR is not about correlated query, and it is about EXISTS, would it be alright if I dun generate correlated query (meaning I generate local context) and have another PR about generating correlated query.
I need some advise on this as I think the refactor may need to change not less part.
Since this PR is not about correlated query, and it is about EXISTS, would it be alright if I dun generate correlated query (meaning I generate local context) and have another PR about generating correlated query.
Sounds good.
This show that if I would want to generate correlated query (Not the local query), I need to pass the information about "Cannot have agg" all the way until gen_expr which is too much of parameter passing.
This is a good point, I think the way we workaround unsupported features have added complexity to the code.
I think an alternative to your suggestion is to handle by matching on the error unimplemented!()
. I think this will handle unimplemented expressions without having to manually check can_agg
, inside_agg
depending on the unsupported expression.
So sqlsmith will still generate these invalid queries, but will just ignore them.
This is similar to our approach for permissible runtime errors: https://github.com/risingwavelabs/risingwave/blob/7a12a6ba0d6977d1363888952e4b7d46bb65cc20/src/tests/sqlsmith/src/runner.rs#L139-L145
Drawback is if sqlsmith generates too many of these. We can verify if this is the case by writing sqlsmith test to check if percentage of generated unimplemented!()
queries is above a certain threshold.
Lets continue discussion under this issue https://github.com/risingwavelabs/risingwave/issues/6592
Hey @marvenlee2486, this pull request failed to merge and has been dequeued from the merge train. If you believe your PR failed in the merge train because of a flaky test, requeue it by clicking "Update branch" or pushing an empty commit with git commit --allow-empty -m "rerun" && git push
.