mm0 icon indicating copy to clipboard operation
mm0 copied to clipboard

first tests for (mm0 parser, mmu parser, mm0+mmu proof checker)

Open Lakedaemon opened this issue 3 years ago • 32 comments

Please tell me what is wrong so that I can fix it and make the next prs behave better. Best regards, Olivier

Lakedaemon avatar Apr 23 '21 12:04 Lakedaemon

I pushed some changes.

  • I shortened the names of the test files; there is no need to go into details and it's annoying to deal with spaces and long names when calling these from the console.
  • The comments are more minimal - in a file that would otherwise only be 6 characters long, a few sentences of comments actually makes the intent of the file less clear. For more complicated and subtle tests a short sentence fragment should be sufficient. Remember, we're going for quantity not quality here.
  • The folder organization was changed to tests/mm0_mmu/fail/test.mm0. The intent is that the top level directory specifies the test format, meaning that all mm0_mmu tests can be run in the same way, by passing MM0 + MMU to a verifier.
  • A tests/mm0_mmu/run-mm0-hs.sh script has been added to run a single test from the pass or fail directory using the mm0-hs tool. More scripts should be added to run other tools, anything that can consume an MM0 + MMU pair. I encourage you to add a similar script to invoke mm0kt, and even PR it here. You should include a link to the installation instructions though, for people who don't have mm0kt on their path.
  • The organization into phases like matching, registering and so on was dropped. You can indicate this in the name of the test if you like, but I don't think the tests need to be any further organized than this. (I'm expecting the mm0_mmu/fail directory to grow to potentially thousands of files without any additional hierarchy.) This is easiest for tooling.

By the way, if it helps any I'm taking https://github.com/leanprover-community/lean/tree/master/tests/lean for inspiration. I don't have *.expected.out files because diagnostic messages are tool specific, although perhaps the fail tests can get expected out files for various tools.

digama0 avatar Apr 23 '21 14:04 digama0

Ok, I'll do it that way, to the best of my capabilities.

when matching mm0 statement vs mmu directive, it's a bit like having an expected.out and checking it, I guess. I wonder if/how it could be done for other stages.

Lakedaemon avatar Apr 23 '21 15:04 Lakedaemon

Am I supposed to do something to merge those 2 tests (that you beautified) ? Is it finished ? For the next tests, am I supposed to commit them in this pr or am I supposed to open another one ?

Lakedaemon avatar Apr 23 '21 15:04 Lakedaemon

The existing tests are fine, but you can add more tests to this PR. You had quite a few more last time you showed me.

digama0 avatar Apr 23 '21 15:04 digama0

ok, will do

Lakedaemon avatar Apr 23 '21 15:04 Lakedaemon

those good ?

Lakedaemon avatar Apr 23 '21 17:04 Lakedaemon

another batch Waiting for feedback and instructions for the next tests

Lakedaemon avatar Apr 23 '21 17:04 Lakedaemon

these are good modulo pass/whitespace2

digama0 avatar Apr 23 '21 17:04 digama0

I'll start working on the next batch then...and slowly resume implementing the improved kotlin patcher for mm0 (and a patch for set.mm.mm0)

Lakedaemon avatar Apr 23 '21 18:04 Lakedaemon

bad-math-string2.mm0 serves no purpose and should be removed :/

Lakedaemon avatar Apr 23 '21 19:04 Lakedaemon

I don't think there is anything wrong with bad-math-string2, that's a possible parsing error. Same with delimiter $ $ $ $ $ $;

digama0 avatar Apr 23 '21 20:04 digama0

You are the boss, you know best.

The list inside delimiter was to test the fact that math string may hold ANY ascii char except '$' also, those are not multicharacter delimiters, that is just a space separated list of 1 char Ascii chars (who are displayed weirdly by their name when they are not printable)

Indeed, it wouldn't hurt to prevent the use of all those legacy and useless ascii (not printable) char.

Lakedaemon avatar Apr 23 '21 20:04 Lakedaemon

I don't think there is anything wrong with bad-math-string2, that's a possible parsing error. Same with delimiter $ $ $ $ $ $;

oh yeah, nice one.

I'm still a bit confused about if/how I should/could test things. I fear that this is going to be hard to test that a developper implements parsing the right way (with the right structure for stuff).

I made 3 attemps at developping parsers for mm0 (and 1 for metamath), So I tried a few things (some of them stupid), so I have some experience in doing dangerous things : like skipping unparsed balanced parentheses strings to parse them later in a different process. It gives me some ideas to help other developpers prevent the same pitfalls :

  • using a third party isWhiteSpace() to skip mm0/mmu whitespace
  • implementing/hacking things quickly while discovering the mm0/mmu specs and not handling correctly corner cases...
  • misunderstanding the spec... But still, I fear that doing a good testing job is going to be hard

You probably have it easier with the mmb format where everything is (certainly) located in a convenient place and you do not have parsing (structure) issues. Here, that's me being jealous :)

Lakedaemon avatar Apr 24 '21 05:04 Lakedaemon

what about too big ints, what should developper do to have valid proofcheckers in this case ? prefix p : $+$ prec 9999999999999999999999999999999999999999999999999999999999999999;

Say, if there is no limit to ints, and a big int overflows yet the developper doesn't guard agains that. So an (mostly) infinite precedence becomes 1 and all math parses become different. Shouldn't we guard agains that ? Looks to me like an exploitable case

Lakedaemon avatar Apr 24 '21 05:04 Lakedaemon

in the spec, shouldn't math-strings be called something else ? I'll be doing mostly maths but you have been doing weirder stuff. Wouldn't another term be more suitable ? treeString ? formula ? parsable ? Just a little morning tease to make, sure you are alert and operational first thing in the morning ! :)

Lakedaemon avatar Apr 24 '21 05:04 Lakedaemon

Regarding primitives, maybe we should test against balanced parenthesis... but I do not see yet how that could be done simply (maybe later when we get to dynamic parsing/proofs)

For comments, we probably should test against weirdly place comments (in whitestuff inside statements/dirrectives) consuming lines should be done right also, so that no part of the file disappear because of rogue comments (exploitable ?) but how to do that ?

Lakedaemon avatar Apr 24 '21 05:04 Lakedaemon

arhhhh line comment don't have to start a line !!!!!!

because of that file ::= (lexeme | whitespace)* whitespace ::= whitestuff+ whitestuff ::= whitechar | line-comment

We can have line comments anywhere a space is ok

so this is valid mm0 stuff

sort-- comment inside a statement s;

Which makes catching/storing/somewhat preserving line comments a complete headache (for a patcher/transformation tool)

That's it...I'm crying ;_;

Lakedaemon avatar Apr 24 '21 06:04 Lakedaemon

mmh. So I got whitespace wrong from the start... I have been thinking about how to preserve space/formatting and there might be a reasonnable simple/performant way to do it. I'll have a go at it.

Lakedaemon avatar Apr 24 '21 07:04 Lakedaemon

As I do not know yet how to simply test balanced parenthesis stuff, parsing primitives are mostly done I'll now start testing higher level stuff (from the bottom up). First commes mm0 statements and mmu directives (separately)

Lakedaemon avatar Apr 24 '21 08:04 Lakedaemon

for sort fail test, we should probably check the order of modifyers...

Lakedaemon avatar Apr 24 '21 08:04 Lakedaemon

I'm wondering if I should commit fail tests for 3 modifiers 4 * 5 = 20 tests :/ and fail test for 4 modifiers = 23 tests :/

If we were thorough, yes... now, betting on what you'll want... :) and waiting for further instructions...

Lakedaemon avatar Apr 24 '21 08:04 Lakedaemon

Let's see. You are a world expert on logic, formalism and the whole point of mm0 is to make sure that there is more truth, less false truth, less bugs in software and maths (and who knows where else...)...

I'll prepare the 43 additionnal tests, while waiting ^^

Lakedaemon avatar Apr 24 '21 10:04 Lakedaemon

when you have this term a: s > t > s;

what names do you give to binders in the mmu file ? (because of alpha renaming, we can do as we want, but what do you suggest ?)

Lakedaemon avatar Apr 24 '21 19:04 Lakedaemon

Those binders are named _.

digama0 avatar Apr 24 '21 23:04 digama0

what about too big ints, what should developper do to have valid proofcheckers in this case ? prefix p : $+$ prec 9999999999999999999999999999999999999999999999999999999999999999;

Say, if there is no limit to ints, and a big int overflows yet the developper doesn't guard agains that. So an (mostly) infinite precedence becomes 1 and all math parses become different. Shouldn't we guard agains that ? Looks to me like an exploitable case

There are two valid behaviors for large precedences (more than 2046).

  1. The number is too large, reject it as soon as it is parsed.
  2. Accept it as a large precedence; it is required to be larger than all smaller numbers you could type there.

That is, the verifier implementation gets to choose what the largest numerical precedence is (as long as it is at least 2046), and all numerical precedences in that range have to be ordered like numbers. If you use a fixed size integer type to hold precedences (which is reasonable and logical), then a reasonable choice for the largest precedence is either MAX or MAX-1, using MAX to hold the special non-numerical precedence max. In this case, during parsing you have to be careful to detect when overflow occurs so you can throw a parsing error. Silently allowing the overflow is not acceptable, because then the precedences will not be in numerical order.

digama0 avatar Apr 25 '21 00:04 digama0

for sort fail test, we should probably check the order of modifyers...

I'm considering making this a run test, but for the moment I believe the spec doesn't allow sort modifiers to be out of order so this is fine.

in the spec, shouldn't math-strings be called something else ?

It's true that you can do lots of things with mathematics, but the name seems fine. It is similar to LaTeX "math mode" as the generic term for "when you are between $ delimiters".

arhhhh line comment don't have to start a line !!!!!! We can have line comments anywhere a space is ok Which makes catching/storing/somewhat preserving line comments a complete headache (for a patcher/transformation tool) That's it...I'm crying ;_;

Yes, this is valid. You should be glad that line comments aren't allowed inside math-strings, I considered that for a while (it can be useful for huge multi-line definitions). But it's a lot harder to reserve -- or some other comment starter inside math strings where almost every character can be used as part of a domain specific operator.

This is part of why --| comments exist. Internal comments like your example should almost always be discarded at the lexer stage, but --| comments immediately before the start of a statement is recognized as a "doc comment" and is transferred along with the statement through tools like from-mm and to-lean.

digama0 avatar Apr 25 '21 00:04 digama0

I'm wondering if I should commit fail tests for 3 modifiers 4 * 5 = 20 tests :/ and fail test for 4 modifiers = 23 tests :/

I think there are only 6 = 4 choose 2 out of order pairs to test: any out of order modifier list needs to contain an adjacent out of order modifier (of which there are 6 possibilities), or a duplicate modifier (of which there are 4 possibilities). There are 16 possible pass tests but I don't think it's important to have them all.

But considering that the main goal here is to catch whether the verifier is using an algorithm that lets these pass, you should only check that the order is correct (by checking that s1234 passes) and that out of order is rejected (so s21, s22, s31, s32, s33, s41, s42, s43, s44 are rejected). I'm pretty sure you will find a bug in one of my parsers with that.

digama0 avatar Apr 25 '21 00:04 digama0

For comments, we probably should test against weirdly place comments (in whitestuff inside statements/dirrectives) consuming lines should be done right also, so that no part of the file disappear because of rogue comments (exploitable ?) but how to do that ?

Yes, surprising comments can definitely cause exploitable issues. (I recall something similar with surprising behavior of the C preprocessor where /* inside the first part of an #if #else #endif can comment out the #else and cause the #if itself to match some other #else, but only when the if condition is true.)

Here's an example of an issue that the current spec may not defend against:

-- I've proved false?
theorem foo: $ F. $;

the secret is that between the two lines is a \r, which may or may not be displayed as a line break but in any case will not terminate the comment (so the theorem is actually in the comment).

digama0 avatar Apr 25 '21 01:04 digama0

I'm wondering if I should commit fail tests for 3 modifiers 4 * 5 = 20 tests :/ and fail test for 4 modifiers = 23 tests :/

I think there are only 6 = 4 choose 2 out of order pairs to test: any out of order modifier list needs to contain an adjacent out of order modifier (of which there are 6 possibilities), or a duplicate modifier (of which there are 4 possibilities). There are 16 possible pass tests but I don't think it's important to have them all.

But considering that the main goal here is to catch whether the verifier is using an algorithm that lets these pass, you should only check that the order is correct (by checking that s1234 passes) and that out of order is rejected (so s21, s22, s31, s32, s33, s41, s42, s43, s44 are rejected). I'm pretty sure you will find a bug in one of my parsers with that.

Damn, I lost my bet :/ So. I'll only be commiting the 16 pass possibilities inside a single pass test and the 6 out of order 2 modifiers fait tests

Lakedaemon avatar Apr 25 '21 04:04 Lakedaemon

What is a "run" test ? Are there other kind of tests ?

On my machine, I run a test, it either fails or pass which is the expected behavior or not, I do not have other options in the end with the testing framework I use (though I may tweek how/when it fails/pass)

Lakedaemon avatar Apr 25 '21 04:04 Lakedaemon