Use proptest in codegen tests

Open nicoabie opened this issue 2 years ago • 1 comments

What kind of change does this PR introduce?

POC of using proptest to help finish the parser #51 What I want to show with this little PR is how easy is to generate families of tests.

What is the current behavior?

Static unit tests

What is the new behavior?

Generated unit tests

Additional context

There are two crates quickcheck and proptest, I found the documentation of proptest easier to understand for newcomers. I 've used PBT in different languages (prolog, scheme, javascript and python) but never in rust so this is my first time.

https://altsysrq.github.io/proptest-book/intro.html

Dec 14 '23 20:12 nicoabie

The value is that you don't need to write all the possible scenarios that will produce the same output from the lexer.

what are all the posible statements that produce vec![TokenProperty::from(SyntaxKind::Select)]?

select *; select * select 1; select 1 select 'a'; select 'a' select 1 as alias; select 1 as alias select 'a' as alias; select 'a' as alias

and now combinations of the previous selecting more than one field.

maybe more? I didn't get into the details of how it works

And now:

1 can be any number composed of 1, 2, 3, N digits
'a' can be any valid sequence of chars
alias can be any valid sequence of chars

'contact' or 'apple' represent all the possible table names? not really therefore you can have a custom arbitrary that generates valid names that respect postgres constraints.

Length: Up to 63 characters.
Characters: Start with a letter or an underscore, followed by letters, numbers, or underscores.
Case: Case-insensitive, but it's a good practice to use lowercase to avoid confusion.
Reserved Words: Avoid using reserved words like "select," "insert," "update," etc.

how many unit tests would you need to really make sure test_select_with_where works? let's see all the possible combinations.

all the combinations of columns that go into test_simple_select
all the combinations of different table names that are valid in postgres described in the previous section
all the different operators to compare (you can have a custom arbitrary)
right operands be numbers or strings (you can have a custom arbitrary)
that where is a very simple one, I could have ANDs, ORs, etc (you can have a custom arbitrary)

That is the value of proptesting, there is no way one can write all the combinations by hand. I guess it depends on the confidence you want/need.

Question is, how do you make the LLM to have that coverage of the domain of the problem?

Dec 17 '23 14:12 nicoabie