risingwave
risingwave copied to clipboard
Tracking Issue: Port sqlsmith to rust for sql fuzzy testing
SQLsmith is a random SQL query generator. It has found many bugs across the database industry.
The original Sqlsmith was licensed under gplv2 which is unfriendly to use. https://github.com/anse1/sqlsmith CockroachDB has implemented a golang version of sqlsmith:
- https://sourcegraph.com/github.com/cockroachdb/cockroach@0782befa48d54a67c3830963280838137542539e/-/blob/pkg/internal/sqlsmith/sqlsmith.go
- https://www.cockroachlabs.com/blog/sqlsmith-randomized-sql-testing/
The remaining work of Sqlsmith:
- [x] #3365
- [x] #3995
- [x] #3959
- [x] #3369
- [x] #4173
- [x] #3366
- [x] #3367
- [x] #3962
- [x] #4540
- [x] #3844
- [x] #3858
- [x] #3878
- [x] #4462
Test suite:
- [x] #4086
- [ ] #4139
- [x] #4056
- [x] ci: enable frontend tests when submitting PR which changes sqlsmith
- [x] #4173
- [x] Sqlsmith: dump setup sql (create table, create materialized view)
- [x] Sqlsmith: add developer docs
RisingWave bugs reported:
- [x] #3356
- [x] #3342
- [x] #3341
- [x] ~~#3908~~
- [x] #4083
- [ ] #4220
- [ ] #4344
- [x] #4399
Tracking issue for workarounds:
- #3896
Other Notes:
- Please run e2e tests locally when submitting PRs, it is not yet integrated into CI as it is still unstable.
- Please prioritize stabilizing SqlSmith, so we can include in our e2e test suite. Will appreciate closing out bugs with SqlSmith and RisingWave first!
Thank you for your contributions!
cc @sumittal
Wondering if something like Quickcheck
's shrink
would be useful here. Do generated queries get very complex and hard to debug?
Perhaps can be used for other fuzzers we eventually use too.
If a generated test case fails, Quickcheck's shrink takes the input and shrinks it as much as possible, to make it easier to debug. In our case, we need to take the generated query and shrink it, producing a simplified query. With the simplified query we re-run it to try and reproduce a crash. If it crashes, we can then use simplified query to debug.
Some ways I can think of shrinking, off the top of my head:
- Reducing no. of terms selected, projected etc...
- Extracting subqueries.
As part of fuzzy test, we should also generate invalid sql, can sqlsmit support that?
As part of fuzzy test, we should also generate invalid sql, can sqlsmit support that?
Currently, no. Invalid sql is too random. If there's unlimited time, surely you can do that.
Do generated queries get very complex and hard to debug?
No, actually. The cockroach version has parameters to control the overall complexity of the generated sql. It will limit the recursions (subqueries), select items, and tables in a join. So generally, we can let the sql adapt to our wishes.
Remaining refinements / workarounds / bugs are tracked in #3896 . Closing this issue.