ocaml icon indicating copy to clipboard operation
ocaml copied to clipboard

Implicit braces in ocamltest trees

Open lukemaurer opened this issue 1 year ago • 5 comments

The new tree syntax for ocamltest greatly improves the common case where several test cases are sequentially composed, each depending on the last. We can now write such tests as

test1;
test2;
test3;
test4;
...
test27;

without increasing the indentation level. This makes sense because these tests are not “nested” in any meaningful sense, any more than consecutive statements in imperative code are nested. Practically speaking, the great advantage is that we can now add test1a between test1 and test2 without increasing indentation (or other markers of nested-ness).

Unfortunately, this only works if test1a belongs in the sequence with the other tests. But it might be completely independent: for example, it might check an error case that is only meaningful after test1 but has no bearing on the other tests. In that case, we have three unpalatable options:

  1. Write it in the sequence anyway. This obscures the structure of the test cases and causes the entire rest of the sequence to be skipped if test1a fails.
  2. Indent every test in the sequence after test1a and risk emotional damage the next time you try to rebase.
  3. Add braces around every test in the sequence after test1a but don't indent, then try to invent a convention that will help you keep the formatting straight.

The annoying thing here is that test4 is still not meaningfully “more nested” than test1. It just happens that there's an extra branch jutting out of the sequence somewhere between test1 and test4, and the syntax makes that everyone's problem.

My proposed syntax is this:

test1;
{
 test1a;
}
test2;
test3;
test4;
...
test27;

This is exactly as if we'd put braces around everything from test2 to test27, but it once again expresses that this is simply a sequence of tests in a chain. It's just that now there are some tests that branch off from the sequence.

lukemaurer avatar Jun 20 '24 09:06 lukemaurer

Technical remarks:

  • Naively I expect an ambiguity in your new parser.mly grammar, where statement_list nonempty_statement_list tree_list could be parsed as either (statement_list nonempty_statement_list) tree_list or statement_list (nonempty_statement_list tree_list). I think that the semantics are equivalent but that the first choice gives a simpler AST. Which one did conflict resolution choose? Can we make the parser unambiguous?

  • tree_list used to but just a list of tree but now it is something weirder, I think that a better name is in order -- parallel-tail?

Design remark: we scratched our head to understand what would be a good semantics for a; (b | c); d, and the current syntax was designed precisely to rule this case out -- the parallel branching can occur only at the very end of sequences. But it does not seem impossible that we would want to choose a semantics for this case in the future, because in most cases (b | c) are in fact mutually exclusive and there is a natural, well-defined semantics. If we wanted to add this to the current syntax we would naturally propose something like: a; { b } { c }; d. I am uneasy about the fact that your proposal is eating the same syntactic space.

I think that deep down your complaint is that the parallel construct has a nesting syntax that encourages indentation (otherwise the closing delimiter is dangling weirdly). Could the problem be solved by adding an infix form to introduce parallelism? For example, I wonder if we could define reasonable precedence rules for | such that

test1;
{ test1a; } |
test2;
test3;
test4;
...
test27;

is an unambiguous way to write what you have in mind. (Note: there is a weird relation to the use of foo & background jobs in shell syntax.)

gasche avatar Jun 20 '24 13:06 gasche

Technical remarks:

  • Naively I expect an ambiguity in your new parser.mly grammar, where statement_list nonempty_statement_list tree_list could be parsed as either (statement_list nonempty_statement_list) tree_list or statement_list (nonempty_statement_list tree_list). I think that the semantics are equivalent but that the first choice gives a simpler AST. Which one did conflict resolution choose? Can we make the parser unambiguous?

It was indeed ambiguous (this was ported from flambda-backend, which apparently isn't running Menhir with --strict). Pushed a fix so that the new clause is only allowed after at least one tree.

  • tree_list used to but just a list of tree but now it is something weirder, I think that a better name is in order -- parallel-tail?

Sure, pushed a commit renaming tree_list to parallel_tail.

I think that deep down your complaint is that the parallel construct has a nesting syntax that encourages indentation (otherwise the closing delimiter is dangling weirdly).

That's about right, yeah. In particular, you do want to indent sometimes, but maintaining a file in which there are both indenting and non-indenting braces sounds pretty miserable.

Could the problem be solved by adding an infix form to introduce parallelism? For example, I wonder if we could define reasonable precedence rules for | such that

test1;
{ test1a; } |
test2;
test3;
test4;
...
test27;

is an unambiguous way to write what you have in mind. (Note: there is a weird relation to the use of foo & background jobs in shell syntax.)

This could work. One awkward thing is that I do still want test2 to be a descendant of test1, so I would want A {B} {C} | D to be A ({B} {C} | D) rather than (A {B} {C}) | D.

If we use the PR as is, we could later express “do this after all the branches” as

test1;
{ test1a; }
{ test1b; }
finally { test1z; }

Then test1z depends on both test1a and test1b.

lukemaurer avatar Jun 20 '24 16:06 lukemaurer

finally could be reasonably interpreted as "do this even if the parallel branches fail", which is not the right semantics. But your suggestion looks reasonable and certainly we could think of another keyword, for example then, to restart sequencing after the parallel part.

I agree that my proposal with | looks a bit ugly -- I haven't found a formulation that I find really nice.

On the other hand, I find that your proposal has counter-intuitive semantics: if we show the test script to people who haven't read the ocamltest documentation (nobody does), are they going to guess the real semantics? (And how would they realize that their assumption is wrong?)

gasche avatar Jun 20 '24 18:06 gasche

Hmm. Maybe we could flip it around a bit and add a keyword that embeds a tree into a statement list? Maybe branch? So my example would look like

test1;
branch { test1a; }
test2;
test3;
...

(I was going to suggest fork at first but, like finally, it has baggage.) That avoids stealing syntax and hopefully suggests to the reader that things that come after test1a don't depend on it.

lukemaurer avatar Jun 21 '24 11:06 lukemaurer

So for example

s1;
s2;
branch { s3; }
s4;
branch { s5; }
s6;
s7;

would be equivalent to

s1;
s2;
{ s3; }
{ s4;
  { s5; }
  { s6;
    s7; }
}

This makes sense to me : it reads better than my proposal, and there is less potential for confusion than with your initial proposal.

Before you do the work of adapting the implementation, I think that it would be nice to have the opinion of other people who formulated opinions on ocamltest syntax in the past, for example @shindere, @damiendoligez or @Octachron.

Note: we do have the property that s1; branch { s2; } branch { s3; } is equivalent to s1; { s2; } { s3; } which is what I would expect.

gasche avatar Jun 21 '24 11:06 gasche

My impression is that other people are not planning to give a different opinion on this topic, so I propose to assume that my dislike of your current syntax forms a (singleton) consensus. Should we close the current PR and let you propose a different syntax when available? (Of course closing is not a final decision, people can always discuss further to revisit.)

gasche avatar Jul 04 '24 09:07 gasche

I agree that the singleton syntax feels too ambiguous. The branch seems okish to me, but I am not sure how common is the situation that you describe. Do you have a rough idea of how many tests are you hoping to simplify with the new syntax?

Octachron avatar Jul 04 '24 12:07 Octachron

I agree that the singleton syntax feels too ambiguous. The branch seems okish to me, but I am not sure how common is the situation that you describe. Do you have a rough idea of how many tests are you hoping to simplify with the new syntax?

I had a look through the existing tests and I don't see any that are obviously in a state to rewrite this way right this second. It would make it easier to add to existing tests going forward, though. For example, link-test/test.ml builds up mylib.cma and then uses it and a few other files for further tests. One could at that point run a few tests that only cover .cma functionality, but currently you'd have to indent the whole rest of the block to do so (or add the new tests into the sequence, which obscures the structure and may mean carefully resetting the environment after the new tests).

lukemaurer avatar Jul 04 '24 14:07 lukemaurer