lean4 icon indicating copy to clipboard operation
lean4 copied to clipboard

[RFC] Pretty Printer / Code Formatting

Open larsk21 opened this issue 3 years ago • 8 comments

I'm currently looking into improving the pretty printer and would like to hear what you think about how we should format Lean code. This is important for many potential editor features, like automatic code formatting, but also features that want to make changes to a Lean file, like automatically generating match cases or adding unimplemented functions of a class to an instance. Here the code formatting would allow the implementation of the feature to work on the Syntax level rather than on the textual level.

Pretty printing in Lean is implemented in three steps: The first step is called delaboration and transforms fully elaborated Expr terms from the Lean core into surface-level Syntax. After that, in the second step, the Syntax is turned into an output independent Format via formatters. Both of these steps are automatically created for new notation / macros (when possible) and can be customized. The resulting Format contains the string contents of the Expr/Syntax representation, but is not fixed to a specific formatting with a fixed target width. This is done in the last step where the (feel free to suggest a name for this step) decides where line breaks should be inserted to get a compact String representation that fits in the available width (if possible). In addition there is the parenthesizer between the delaborator and the formatter that inserts parantheses into a Syntax object where necessary.

Delaborator: Expr to Syntax

From what I've seen so far, I don't think the delaborators require much work, but feel free to correct me here. This step is also not required for the code formatting use case.

Formatter: Syntax to Format

This is probably the most controversial part, since each of us has different preferences on how to format (Lean) code. But this step needs work nonetheless, because some formatters are broken or output code that we would not want to have in our Lean projects. I hope we can find some clean formatting rules that most people can work with. Feel free to make suggestions on how to format specific Lean syntax and I will also post examples here when I encounter them.

Examples of broken formatters:

def foo (xs : Array Nat) :=
  Id.run do
    -- do
      -- something
    for (i, x) in xs.mapIdx (·, ·)do
      bar i x where
  bar := fun _ _ => ()

Formatters are either implemented as Formatters or (most of the time) automatically generated. To control the automatic formatter, there are formatting primitives that can be used when defining a new notation, such as ppLine, which emits a hard line break, or ppSpace, which emits a soft line break that can be printed as either a space or a line break. I'd also like to change some of the names of these formatting primitives to make it more obvious what their purpose is.

?: Format to String

This step alone is sometimes called pretty printer in literature, so I will use this term here as well. Up until now, I looked into three language independent pretty printers, listed below. All of those are based on an algebra desribing the semantics of their Format combinators. (The relevant data type is usually called Doc instead of Format).

  • Hughes' pretty printer, which can be too greedy when processing long lines.
  • Wadler/Leijen's pretty printer, which is also greedy, but delivers good results otherwise. This is the one that is currently used in Lean.
  • Bernardy's pretty printer, which is not greedy and allows dynamic indentation (example below). Even though it's not greedy, it's supposed to have linear complexity in the input size (after optimizations), but it's still aroumd 6 - 10 times slower than the Wadler/Leijen pretty printer. It is also unclear if we would want to use dynamic indentation in Lean at all.

Example of constant and dynamic indentation:

-- constant indentation (all items have a fixed indentation of 2)
foo [
  a,
  b,
  c]

-- dynamic indentation (all items after the first one are aligned with the beginning of the first item)
foo [a,
     b,
     c]

My current idea is to keep the Wadler/Leijen pretty printer, but refactor it to make it more understandable and extensible. One possible extension could be anchors, which are necessary for the alignment of e.g. match cases.

match x? with
-- align all `=>`s
| .some x => _
| .none   => _

Testing the Pretty Printer

You can easily test the current pretty printer by using the linter API. This will show you an info message with the formatted code for each command.

import Lean.Elab.Command
import Lean.PrettyPrinter

open Lean Lean.Elab.Command Lean.PrettyPrinter

def testPrettyPrinter : Linter := fun stx => do
  let width := 80

  let fmt ← liftCoreM <| ppTerm (TSyntax.mk stx)
  let str := fmt.pretty width

  logInfoAt stx str

initialize addLinter testPrettyPrinter

larsk21 avatar Aug 16 '22 07:08 larsk21

You might be interested in this discussion from another issue: https://github.com/leanprover/lean4/issues/369#issuecomment-1017837443

gebner avatar Aug 16 '22 07:08 gebner

I think a good challenge is to get the reformatted Lean code to compile. That is, try scripts/reformat.lean on the whole Lean code base, try to compile it, and see where that breaks.

Another annoying formatting issue is that by simp is always formatted as:

by
  simp

(That is, with a newline even if it's only a single tactic.) This really messes up formatting with terms like c2 _ _ (by rwa [← c1]) hU which would fit on one line other wise. We already have a sepByIndent combinator that does the right formatting. All that's necessary is to migrate by-blocks to this combinator. (And also do, but that's I believe a lot more work.)

gebner avatar Aug 16 '22 14:08 gebner

Another fun issue related to indentation-sensitive syntax:

      (let x : α := a₁
        b x) 

Note that this does not compile!

gebner avatar Aug 18 '22 16:08 gebner

And also do, but that's I believe a lot more work.

Another fun example @larsk21 and I just talked about is

foo fun x => bar do
  ...

We need something stronger than ppAllowUngrouped to solve this. Lars suggested a kind of priority annotation for soft line breaks (though would the breaks for => and do have the same priority? Do we need another attribute for "use innermost such line break"?). Or we can revisit the ideas from Bernardy's pretty printer, as a non-greedy printer should be able to handle this without further annotations I assume. But that would be a much bigger project.

Kha avatar Aug 24 '22 14:08 Kha

Imagine a subsequent argument to bar for extra fun

Kha avatar Aug 24 '22 15:08 Kha

Can you explain what's going wrong in the example above? Is it too much indentation?

I'm not completely happy with the ppAllowUngrouped solution either.

gebner avatar Aug 24 '22 15:08 gebner

Ah sorry, the indentation was supposed to be 2. The example is supposed to be the desirable output, but it defies the "break from outside in" logic all the greedy printers use, which leads to the current output

  foo fun x =>
    bar do
      ...

Actually @larsk21's experimental reimplementation of Format.be does not trigger outer line breaks on an inner hard line break (which could be consider a simple version of line break priorities), but since we were talking about moving away from a hard line break after by/do anyway...

Kha avatar Aug 24 '22 15:08 Kha

Now that I've written it down though, it doesn't even look too bad thanks to ppAllowUngrouped. Might not be the most pressing issue.

Kha avatar Aug 24 '22 15:08 Kha