mm0
mm0 copied to clipboard
first tests for (mm0 parser, mmu parser, mm0+mmu proof checker)
Please tell me what is wrong so that I can fix it and make the next prs behave better. Best regards, Olivier
I pushed some changes.
- I shortened the names of the test files; there is no need to go into details and it's annoying to deal with spaces and long names when calling these from the console.
- The comments are more minimal - in a file that would otherwise only be 6 characters long, a few sentences of comments actually makes the intent of the file less clear. For more complicated and subtle tests a short sentence fragment should be sufficient. Remember, we're going for quantity not quality here.
- The folder organization was changed to
tests/mm0_mmu/fail/test.mm0
. The intent is that the top level directory specifies the test format, meaning that allmm0_mmu
tests can be run in the same way, by passing MM0 + MMU to a verifier. - A
tests/mm0_mmu/run-mm0-hs.sh
script has been added to run a single test from the pass or fail directory using the mm0-hs tool. More scripts should be added to run other tools, anything that can consume an MM0 + MMU pair. I encourage you to add a similar script to invokemm0kt
, and even PR it here. You should include a link to the installation instructions though, for people who don't havemm0kt
on their path. - The organization into phases like matching, registering and so on was dropped. You can indicate this in the name of the test if you like, but I don't think the tests need to be any further organized than this. (I'm expecting the
mm0_mmu/fail
directory to grow to potentially thousands of files without any additional hierarchy.) This is easiest for tooling.
By the way, if it helps any I'm taking https://github.com/leanprover-community/lean/tree/master/tests/lean for inspiration. I don't have *.expected.out
files because diagnostic messages are tool specific, although perhaps the fail
tests can get expected out files for various tools.
Ok, I'll do it that way, to the best of my capabilities.
when matching mm0 statement vs mmu directive, it's a bit like having an expected.out and checking it, I guess. I wonder if/how it could be done for other stages.
Am I supposed to do something to merge those 2 tests (that you beautified) ? Is it finished ? For the next tests, am I supposed to commit them in this pr or am I supposed to open another one ?
The existing tests are fine, but you can add more tests to this PR. You had quite a few more last time you showed me.
ok, will do
those good ?
another batch Waiting for feedback and instructions for the next tests
these are good modulo pass/whitespace2
I'll start working on the next batch then...and slowly resume implementing the improved kotlin patcher for mm0 (and a patch for set.mm.mm0)
bad-math-string2.mm0 serves no purpose and should be removed :/
I don't think there is anything wrong with bad-math-string2, that's a possible parsing error. Same with delimiter $ $ $ $ $ $;
You are the boss, you know best.
The list inside delimiter was to test the fact that math string may hold ANY ascii char except '$' also, those are not multicharacter delimiters, that is just a space separated list of 1 char Ascii chars (who are displayed weirdly by their name when they are not printable)
Indeed, it wouldn't hurt to prevent the use of all those legacy and useless ascii (not printable) char.
I don't think there is anything wrong with bad-math-string2, that's a possible parsing error. Same with
delimiter $ $ $ $ $ $;
oh yeah, nice one.
I'm still a bit confused about if/how I should/could test things. I fear that this is going to be hard to test that a developper implements parsing the right way (with the right structure for stuff).
I made 3 attemps at developping parsers for mm0 (and 1 for metamath), So I tried a few things (some of them stupid), so I have some experience in doing dangerous things : like skipping unparsed balanced parentheses strings to parse them later in a different process. It gives me some ideas to help other developpers prevent the same pitfalls :
- using a third party isWhiteSpace() to skip mm0/mmu whitespace
- implementing/hacking things quickly while discovering the mm0/mmu specs and not handling correctly corner cases...
- misunderstanding the spec... But still, I fear that doing a good testing job is going to be hard
You probably have it easier with the mmb format where everything is (certainly) located in a convenient place and you do not have parsing (structure) issues. Here, that's me being jealous :)
what about too big ints, what should developper do to have valid proofcheckers in this case ? prefix p : $+$ prec 9999999999999999999999999999999999999999999999999999999999999999;
Say, if there is no limit to ints, and a big int overflows yet the developper doesn't guard agains that. So an (mostly) infinite precedence becomes 1 and all math parses become different. Shouldn't we guard agains that ? Looks to me like an exploitable case
in the spec, shouldn't math-strings be called something else ? I'll be doing mostly maths but you have been doing weirder stuff. Wouldn't another term be more suitable ? treeString ? formula ? parsable ? Just a little morning tease to make, sure you are alert and operational first thing in the morning ! :)
Regarding primitives, maybe we should test against balanced parenthesis... but I do not see yet how that could be done simply (maybe later when we get to dynamic parsing/proofs)
For comments, we probably should test against weirdly place comments (in whitestuff inside statements/dirrectives) consuming lines should be done right also, so that no part of the file disappear because of rogue comments (exploitable ?) but how to do that ?
arhhhh line comment don't have to start a line !!!!!!
because of that file ::= (lexeme | whitespace)* whitespace ::= whitestuff+ whitestuff ::= whitechar | line-comment
We can have line comments anywhere a space is ok
so this is valid mm0 stuff
sort-- comment inside a statement s;
Which makes catching/storing/somewhat preserving line comments a complete headache (for a patcher/transformation tool)
That's it...I'm crying ;_;
mmh. So I got whitespace wrong from the start... I have been thinking about how to preserve space/formatting and there might be a reasonnable simple/performant way to do it. I'll have a go at it.
As I do not know yet how to simply test balanced parenthesis stuff, parsing primitives are mostly done I'll now start testing higher level stuff (from the bottom up). First commes mm0 statements and mmu directives (separately)
for sort fail test, we should probably check the order of modifyers...
I'm wondering if I should commit fail tests for 3 modifiers 4 * 5 = 20 tests :/ and fail test for 4 modifiers = 23 tests :/
If we were thorough, yes... now, betting on what you'll want... :) and waiting for further instructions...
Let's see. You are a world expert on logic, formalism and the whole point of mm0 is to make sure that there is more truth, less false truth, less bugs in software and maths (and who knows where else...)...
I'll prepare the 43 additionnal tests, while waiting ^^
when you have this term a: s > t > s;
what names do you give to binders in the mmu file ? (because of alpha renaming, we can do as we want, but what do you suggest ?)
Those binders are named _
.
what about too big ints, what should developper do to have valid proofcheckers in this case ? prefix p : $+$ prec 9999999999999999999999999999999999999999999999999999999999999999;
Say, if there is no limit to ints, and a big int overflows yet the developper doesn't guard agains that. So an (mostly) infinite precedence becomes 1 and all math parses become different. Shouldn't we guard agains that ? Looks to me like an exploitable case
There are two valid behaviors for large precedences (more than 2046).
- The number is too large, reject it as soon as it is parsed.
- Accept it as a large precedence; it is required to be larger than all smaller numbers you could type there.
That is, the verifier implementation gets to choose what the largest numerical precedence is (as long as it is at least 2046), and all numerical precedences in that range have to be ordered like numbers. If you use a fixed size integer type to hold precedences (which is reasonable and logical), then a reasonable choice for the largest precedence is either MAX or MAX-1, using MAX to hold the special non-numerical precedence max
. In this case, during parsing you have to be careful to detect when overflow occurs so you can throw a parsing error. Silently allowing the overflow is not acceptable, because then the precedences will not be in numerical order.
for sort fail test, we should probably check the order of modifyers...
I'm considering making this a run
test, but for the moment I believe the spec doesn't allow sort modifiers to be out of order so this is fine.
in the spec, shouldn't math-strings be called something else ?
It's true that you can do lots of things with mathematics, but the name seems fine. It is similar to LaTeX "math mode" as the generic term for "when you are between $
delimiters".
arhhhh line comment don't have to start a line !!!!!! We can have line comments anywhere a space is ok Which makes catching/storing/somewhat preserving line comments a complete headache (for a patcher/transformation tool) That's it...I'm crying ;_;
Yes, this is valid. You should be glad that line comments aren't allowed inside math-strings, I considered that for a while (it can be useful for huge multi-line definitions). But it's a lot harder to reserve --
or some other comment starter inside math strings where almost every character can be used as part of a domain specific operator.
This is part of why --|
comments exist. Internal comments like your example should almost always be discarded at the lexer stage, but --|
comments immediately before the start of a statement is recognized as a "doc comment" and is transferred along with the statement through tools like from-mm
and to-lean
.
I'm wondering if I should commit fail tests for 3 modifiers 4 * 5 = 20 tests :/ and fail test for 4 modifiers = 23 tests :/
I think there are only 6 = 4 choose 2 out of order pairs to test: any out of order modifier list needs to contain an adjacent out of order modifier (of which there are 6 possibilities), or a duplicate modifier (of which there are 4 possibilities). There are 16 possible pass tests but I don't think it's important to have them all.
But considering that the main goal here is to catch whether the verifier is using an algorithm that lets these pass, you should only check that the order is correct (by checking that s1234 passes) and that out of order is rejected (so s21, s22, s31, s32, s33, s41, s42, s43, s44 are rejected). I'm pretty sure you will find a bug in one of my parsers with that.
For comments, we probably should test against weirdly place comments (in whitestuff inside statements/dirrectives) consuming lines should be done right also, so that no part of the file disappear because of rogue comments (exploitable ?) but how to do that ?
Yes, surprising comments can definitely cause exploitable issues. (I recall something similar with surprising behavior of the C preprocessor where /*
inside the first part of an #if #else #endif
can comment out the #else
and cause the #if
itself to match some other #else
, but only when the if condition is true.)
Here's an example of an issue that the current spec may not defend against:
-- I've proved false?
theorem foo: $ F. $;
the secret is that between the two lines is a \r
, which may or may not be displayed as a line break but in any case will not terminate the comment (so the theorem
is actually in the comment).
I'm wondering if I should commit fail tests for 3 modifiers 4 * 5 = 20 tests :/ and fail test for 4 modifiers = 23 tests :/
I think there are only 6 = 4 choose 2 out of order pairs to test: any out of order modifier list needs to contain an adjacent out of order modifier (of which there are 6 possibilities), or a duplicate modifier (of which there are 4 possibilities). There are 16 possible pass tests but I don't think it's important to have them all.
But considering that the main goal here is to catch whether the verifier is using an algorithm that lets these pass, you should only check that the order is correct (by checking that s1234 passes) and that out of order is rejected (so s21, s22, s31, s32, s33, s41, s42, s43, s44 are rejected). I'm pretty sure you will find a bug in one of my parsers with that.
Damn, I lost my bet :/ So. I'll only be commiting the 16 pass possibilities inside a single pass test and the 6 out of order 2 modifiers fait tests
What is a "run" test ? Are there other kind of tests ?
On my machine, I run a test, it either fails or pass which is the expected behavior or not, I do not have other options in the end with the testing framework I use (though I may tweek how/when it fails/pass)