risingwave icon indicating copy to clipboard operation
risingwave copied to clipboard

feat(SqlSmith): gen Table Subquery ' EXISTS'

Open marvenlee2486 opened this issue 2 years ago • 5 comments

I hereby agree to the terms of the Singularity Data, Inc. Contributor License Agreement.

What's changed and what's your intention?

EXISTS implement

Checklist

  • [x] I have written necessary rustdoc comments
  • [x] I have added necessary unit tests and integration tests
  • [x] All checks passed in ./risedev check (or alias, ./risedev c)

Refer to a related PR or issue link (optional)

close #4296

marvenlee2486 avatar Aug 04 '22 07:08 marvenlee2486

There is 1 out of 1024 query that I run have this issue https://github.com/singularity-data/risingwave/issues/4434 . Need to fix this first

marvenlee2486 avatar Aug 04 '22 07:08 marvenlee2486

Codecov Report

Merging #4431 (1845a80) into main (a07f267) will decrease coverage by 0.02%. The diff coverage is 74.61%.

@@            Coverage Diff             @@
##             main    #4431      +/-   ##
==========================================
- Coverage   73.86%   73.83%   -0.03%     
==========================================
  Files        1002     1004       +2     
  Lines      162583   162749     +166     
==========================================
+ Hits       120088   120166      +78     
- Misses      42495    42583      +88     
Flag Coverage Δ
rust 73.83% <74.61%> (-0.03%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/batch/src/executor/group_top_n.rs 68.42% <ø> (-6.44%) :arrow_down:
...batch/src/executor/join/distributed_lookup_join.rs 0.00% <0.00%> (ø)
src/batch/src/executor/join/nested_loop_join.rs 91.73% <ø> (ø)
src/batch/src/executor/join/sort_merge_join.rs 79.04% <ø> (ø)
src/batch/src/executor/order_by.rs 95.32% <ø> (ø)
src/batch/src/executor/project_set.rs 76.15% <ø> (ø)
src/batch/src/executor/sys_row_seq_scan.rs 0.00% <0.00%> (ø)
src/batch/src/executor/top_n.rs 75.00% <ø> (ø)
src/batch/src/executor/update.rs 81.36% <ø> (ø)
.../batch/src/task/consistent_hash_shuffle_channel.rs 0.00% <0.00%> (ø)
... and 107 more

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

codecov[bot] avatar Aug 04 '22 07:08 codecov[bot]

There is 1 out of 1024 query that I run have this issue #4434 . Need to fix this first

alright, I think it is nothing related to exists, we can merge it. Because the problem occur in subquery with aggregate. Probably this probability is very low, until now it is notice.

marvenlee2486 avatar Aug 05 '22 06:08 marvenlee2486

For some exists query, e.g.

SELECT t1.* FROM t1 WHERE EXISTS (SELECT * FROM t3 WHERE t1.c1 = t3.c1)

It can be transformed into a hash join, not necessarily a nested loop join.

SELECT t1.* FROM t1, t3 WHERE t1.c1 = t3.c1

So we can probably do more than this.

Clarification: Do you mean that exists can be transformed into something we already test (hash join), so we probably don't need to generate it, there are other more important things to generate?

kwannoel avatar Aug 05 '22 07:08 kwannoel

We still need to generate in these cases as we can test the transformed into something. I mean, we can generate correlated subqueries in the exists clause.

There is a comment in this PR saying:

// TODO: Streaming nested loop join is not implemented yet. // Tracked by: https://github.com/singularity-data/risingwave/issues/2655.`

I thought it tries to say that because of this limitation, there is a set of queries we cannot generate in exists clause. So I just try to clarify that there is a subset of queries we still can generate.

lmatz avatar Aug 05 '22 07:08 lmatz

~~Any updates?~~ Edit: Zongyu's laptop is broken unfortunately. Resume after new laptop is setup.

kwannoel avatar Aug 22 '22 11:08 kwannoel

Currently a few feature not yet implemented

  1. Local subquery inside Aggregate
  2. Correlated subquery in HAVING or SELECT with agg is not implemented

so I decided to workaround, as such If currently the subquery is inside an aggregate function, then it won't generate EXISTS subquery

With this, when I want to generate correlated subquery. The following problem shown up https://github.com/risingwavelabs/risingwave/issues/6350

As for Imatz comment on the possible EXISTS subquery generated by materialised view. I will look into what I can do in a later time.

marvenlee2486 avatar Nov 14 '22 09:11 marvenlee2486

The problem I facing is "Feature is not yet implemented: correlated subquery in HAVING or SELECT with agg,"

This show that if I would want to generate correlated query (Not the local query), I need to pass the information about "Cannot have agg" all the way until gen_expr which is too much of parameter passing.

I am planning to refactor some code to able to workaround unsupported feature.

  1. put the two variable 'inside_agg' and 'can_agg' (https://github.com/risingwavelabs/risingwave/blob/a233051741fbd03c68ed1150adf95fcd64961c7d/src/tests/sqlsmith/src/expr.rs#L63) as internal state instead of passing around gen_expr. Meaning that the variable will become global to all the methods (functions).
  2. Since the state is "global" to the method so I need to ensure correctness, the same way as how the gen_local_query done is, saving all the state and restoring the state upon leaving the function. (This may need some some refactor).
  3. By doing so, we can actually think of a few more internal state that can help us with workaround for example, if it is mv, then we avoid generating Nested Loop Join.

Previously, I didn't, store it as internal state because the function is keep on being called with in different context so to avoid mistake when I put them in internal state I didn't put them in internal state. But then now I feel it would be better to refactor this part to allow more workaround.

Since this PR is not about correlated query, and it is about EXISTS, would it be alright if I dun generate correlated query (meaning I generate local context) and have another PR about generating correlated query.

I need some advise on this as I think the refactor may need to change not less part.

marvenlee2486 avatar Nov 25 '22 12:11 marvenlee2486

Since this PR is not about correlated query, and it is about EXISTS, would it be alright if I dun generate correlated query (meaning I generate local context) and have another PR about generating correlated query.

Sounds good.

kwannoel avatar Nov 25 '22 13:11 kwannoel

This show that if I would want to generate correlated query (Not the local query), I need to pass the information about "Cannot have agg" all the way until gen_expr which is too much of parameter passing.

This is a good point, I think the way we workaround unsupported features have added complexity to the code. I think an alternative to your suggestion is to handle by matching on the error unimplemented!(). I think this will handle unimplemented expressions without having to manually check can_agg, inside_agg depending on the unsupported expression.

So sqlsmith will still generate these invalid queries, but will just ignore them.

This is similar to our approach for permissible runtime errors: https://github.com/risingwavelabs/risingwave/blob/7a12a6ba0d6977d1363888952e4b7d46bb65cc20/src/tests/sqlsmith/src/runner.rs#L139-L145

Drawback is if sqlsmith generates too many of these. We can verify if this is the case by writing sqlsmith test to check if percentage of generated unimplemented!() queries is above a certain threshold.

kwannoel avatar Nov 25 '22 13:11 kwannoel

Lets continue discussion under this issue https://github.com/risingwavelabs/risingwave/issues/6592

marvenlee2486 avatar Nov 25 '22 19:11 marvenlee2486

Hey @marvenlee2486, this pull request failed to merge and has been dequeued from the merge train. If you believe your PR failed in the merge train because of a flaky test, requeue it by clicking "Update branch" or pushing an empty commit with git commit --allow-empty -m "rerun" && git push.

mergify[bot] avatar Nov 25 '22 19:11 mergify[bot]