lean4 icon indicating copy to clipboard operation
lean4 copied to clipboard

feat: labeled break/continue

Open digama0 opened this issue 3 years ago • 0 comments

This is an implementation of labeled break and continue in do notation.

example : IO Unit := do
  for (label := xLoop) x in [1:10] do
    for (label := yLoop) y in [1:10] do
      for z in [1:10] do
        if (x, y, z) = (5, 6, 7) then
          break (label := xLoop)
        else if (x, y, z) = (7, 8, 9) then
          continue (label := yLoop)

It's a relatively straighforward extension to the existing do notation macro, although there is a lot of repeated substructure in the definitions due to the way all the DoResult* types work.

A summary:

  • The syntax of for, while, repeat, break, and continue is extended with an optional (label := l) argument. (This is a somewhat heavyweight syntax, but it's used rarely enough that I don't think it will be an issue.)

  • There are three main contexts where do sequences can appear: at the top level (.regular), inside a for loop (.forIn), and inside a monadic combinator (.nested*).

    • In .regular mode there is nothing to do since no loop labels can be in scope.

    • In .forIn, there used to be two cases: either the loop contains a return or not. Now, the loop may also contain a break/continue to an outer loop, which mostly acts the same as return from the point of view of the inner loop. Because of the same type inference issues that resulted in the DoResult types, there are now four forIn cases:

      • .forIn: only break and continue targeting the current loop, no return
      • .forInR (was .forInWithReturn): only break and continue targeting the current loop, but return is present
      • .forInBC: break and continue may target outer loops, no return
      • .forInRBC: break and continue may target outer loops, return is present

      Each case has a different return type:

      • forIn: m Vars
      • forInR: m (Option ReturnType × Vars)
      • forInBC: m (Option BC × Vars)
      • forInRBC: m (Option (BC ⊕ ReturnType) × Vars)

      Here Option (BC ⊕ ReturnType) represents the type of "places to go after exiting the loop".

      • The none option means a normal exit, no jump
      • The ReturnType option means a return val, exit to the top level
      • BC.continue (n : Nat) means to continue a loop n labels above this one. That is, labeled loops are effectively de bruijn indexed by the type. Unlabeled loops are skipped in the numbering, and continue 0 means to continue the loop just outside the one we just exited.
      • BC.break (n : Nat) similarly means to break from the n'th surrounding loop.

      Actually, Option BC is pre-baked into the type DoStateBC, and Option (BC ⊕ ReturnType) is baked into the type DoStateRBC ReturnType to avoid unnecessary allocations. This does lead to a lot of repetition in the pattern matches though.

    • The third category is .nested, where we have to pack up all the state, including the "pure" value to return to the surrounding context. These are largely unchanged, but the DoResult(S)BC.{break, continue} got a Nat field encoding the loop index to jump to. Note that index 0 here corresponds to unlabeled break (jump to the immediately surrounding loop), while n+1 corresponds to the outer loops.

  • When a labeled break is propagating up the stack, we are mostly only decrementing the loop label index as we exit the loops. But since this is a macro, it has to generate syntax which continues the propagation, and we don't want to explode a match with all n cases to say "match result with | .break 0 => `(break (label := yLoop)) | .break 1 => `(break (label := xLoop))" and so on for all the loops, since they are mostly just going to be packed back into integers in the next macro expansion. So to make this more efficient, we have breakIdx% n. This will break to the n'th labeled loop, not counting the current one (even if it has a label). Then we only have two cases in the match: .break 0 => break and .break (i+1) => breakIdx% i. When the current loop is not labeled this simplifies to just .break i => breakIdx% i.

There are still some things left to make this more than an MVP:

  • Add more documentation
  • Add syntax highlighting / go-to-definition for loop labels
  • unused variable linter for loop labels
  • Check that it's not a performance hit. (For definitions not using labeled loops, the desugaring is almost unchanged, except that the DoResultPRBC and DoResultSBC types have gotten a little larger, with an extra contstructor argument which is always set to 0 and ignored in the match.)

I suspect this will be a difficult PR to review, so let me know if there is anything I can do to help.

digama0 avatar Aug 02 '22 04:08 digama0