risingwave icon indicating copy to clipboard operation
risingwave copied to clipboard

Tracking Issue: Port sqlsmith to rust for sql fuzzy testing

Open neverchanje opened this issue 2 years ago • 5 comments

SQLsmith is a random SQL query generator. It has found many bugs across the database industry.

The original Sqlsmith was licensed under gplv2 which is unfriendly to use. https://github.com/anse1/sqlsmith CockroachDB has implemented a golang version of sqlsmith:

  • https://sourcegraph.com/github.com/cockroachdb/cockroach@0782befa48d54a67c3830963280838137542539e/-/blob/pkg/internal/sqlsmith/sqlsmith.go
  • https://www.cockroachlabs.com/blog/sqlsmith-randomized-sql-testing/

The remaining work of Sqlsmith:

  • [x] #3365
  • [x] #3995
  • [x] #3959
  • [x] #3369
  • [x] #4173
  • [x] #3366
  • [x] #3367
  • [x] #3962
  • [x] #4540
  • [x] #3844
  • [x] #3858
  • [x] #3878
  • [x] #4462

Test suite:

  • [x] #4086
  • [ ] #4139
  • [x] #4056
  • [x] ci: enable frontend tests when submitting PR which changes sqlsmith
  • [x] #4173
  • [x] Sqlsmith: dump setup sql (create table, create materialized view)
  • [x] Sqlsmith: add developer docs

RisingWave bugs reported:

  • [x] #3356
  • [x] #3342
  • [x] #3341
  • [x] ~~#3908~~
  • [x] #4083
  • [ ] #4220
  • [ ] #4344
  • [x] #4399

Tracking issue for workarounds:

  • #3896

Other Notes:

  • Please run e2e tests locally when submitting PRs, it is not yet integrated into CI as it is still unstable.
  • Please prioritize stabilizing SqlSmith, so we can include in our e2e test suite. Will appreciate closing out bugs with SqlSmith and RisingWave first!

Thank you for your contributions!

neverchanje avatar May 16 '22 07:05 neverchanje

cc @sumittal

liurenjie1024 avatar May 16 '22 09:05 liurenjie1024

Wondering if something like Quickcheck's shrink would be useful here. Do generated queries get very complex and hard to debug?

Perhaps can be used for other fuzzers we eventually use too.

If a generated test case fails, Quickcheck's shrink takes the input and shrinks it as much as possible, to make it easier to debug. In our case, we need to take the generated query and shrink it, producing a simplified query. With the simplified query we re-run it to try and reproduce a crash. If it crashes, we can then use simplified query to debug.

Some ways I can think of shrinking, off the top of my head:

  • Reducing no. of terms selected, projected etc...
  • Extracting subqueries.

kwannoel avatar Jun 27 '22 04:06 kwannoel

As part of fuzzy test, we should also generate invalid sql, can sqlsmit support that?

liurenjie1024 avatar Jun 27 '22 05:06 liurenjie1024

As part of fuzzy test, we should also generate invalid sql, can sqlsmit support that?

Currently, no. Invalid sql is too random. If there's unlimited time, surely you can do that.

neverchanje avatar Jul 04 '22 01:07 neverchanje

Do generated queries get very complex and hard to debug?

No, actually. The cockroach version has parameters to control the overall complexity of the generated sql. It will limit the recursions (subqueries), select items, and tables in a join. So generally, we can let the sql adapt to our wishes.

neverchanje avatar Jul 04 '22 02:07 neverchanje

Remaining refinements / workarounds / bugs are tracked in #3896 . Closing this issue.

kwannoel avatar Aug 16 '22 05:08 kwannoel