smlfmt icon indicating copy to clipboard operation
smlfmt copied to clipboard

Vertically align tokens

Open HarrisonGrodin opened this issue 4 years ago • 18 comments

We should at least align delimiters, imo, e.g.:

val apple  = 0
and orange = 1

val () =
  case foo of
    (NONE, _)        => ()
  | (_, NONE)        => ()
  | (SOME x, SOME y) => ()

datatype apple = Foo | Bar
and orange     = Baz

I would also prefer right-aligning keywords before name definitions so names are aligned, e.g.:

eqtype apple
   and orange

datatype apple  = Foo | Bar
     and orange = Baz

Unsure if this is a popular opinion, though.

HarrisonGrodin avatar Dec 25 '21 23:12 HarrisonGrodin

@shwestrick Thinking about how to implement this. Instinctively, I want to compute the length of each column and (left or right) align by padding with spaces. However, I'm not sure how this would work with comments and non-atomic components (e.g., ('a, 'b, 'verylong...) apple). Do you think something like this would be workable, or perhaps this alignment should be a doc operation?

HarrisonGrodin avatar Dec 26 '21 20:12 HarrisonGrodin

Yeah I've thought a bit about this and I'm not sure what the best solution is.

The simplest solution, from the layout perspective, is something like

datatype doc =
...
| Table of {numRows: int, numCols: int, elem: (int * int) -> token}

And you could imagine augmenting it with padding/justification info, like a function int -> {Left | Right | Center} which describes how to justify each column.

But this kinda sucks for actually writing formatter cases.

shwestrick avatar Dec 26 '21 23:12 shwestrick

It might be possible to instead allow each element of the table to be a doc, with nested tables. Not sure how to lay those out.

shwestrick avatar Dec 26 '21 23:12 shwestrick

I would also prefer right-aligning keywords before name definitions so names are aligned

Personally I find this really difficult to read, because I look to the left to see where the next declaration begins, rather than at the names.

This maybe isn't too bad?

eqtype apple
and    orange

datatype apple  = Foo | Bar
and      orange = Baz

shwestrick avatar Dec 26 '21 23:12 shwestrick

It might be possible to instead allow each element of the table to be a doc, with nested tables. Not sure how to lay those out.

Hmm, yeah... I think we'd need to support at least a few tokens in a row (Token.t Seq.t?), due to type parameters:

datatype ('a, 'b) apple = Foo 'a | Bar of 'b
and orange              = Baz of (int, int) apple

HarrisonGrodin avatar Dec 26 '21 23:12 HarrisonGrodin

Personally I find this really difficult to read, because I look to the left to see where the next declaration begins, rather than at the names.

Ah, I see. My eye tends to center around the sort/name, I think (datatype foo, and bar), which makes right alignment read more naturally.

I wonder how each of our approaches fare as the length of type names vary.

Some sample cases:

datatype foo = Foo
and bar      = Bar

(* left/left *)
datatype foo = Foo
and      bar = Bar

(* right/left, right/right *)
datatype foo = Foo
     and bar = Bar
datatype ('a, 'b) apple = Foo
and orange              = Bar

(* left/left *)
datatype ('a, 'b) apple = Foo
and      orange         = Bar

(* right/left *)
datatype ('a, 'b) apple = Foo
     and orange         = Bar

(* right/right *)
datatype ('a, 'b) apple = Foo
     and         orange = Bar
datatype apple      = Foo
and ('a, 'b) orange = Bar

(* left/left *)
datatype apple           = Foo
and      ('a, 'b) orange = Bar

(* right/left *)
datatype apple           = Foo
     and ('a, 'b) orange = Bar

(* right/right *)
datatype           apple = Foo
     and ('a, 'b) orange = Bar

(Edit: for type/eqtype/datatype, I suppose there's also the question of whether type parameters would be in a column of their own...)

HarrisonGrodin avatar Dec 27 '21 00:12 HarrisonGrodin

Just realizing - the left vs. right justification issue only arises in a few relatively uncommon cases for and: afaict val rec, type, eqtype, datatype, exception, structure, signature, functor. The other cases, val and fun (most common?) don't matter anyway, since they're three letters.

EDIT: where, too.

HarrisonGrodin avatar Dec 27 '21 00:12 HarrisonGrodin

Just remembered another use case for this (in specs):

val twelve : int
and thirteen : int

(* formatted to *)
val twelve   : int
and thirteen : int

If values are declared together, should probably align their :s, so asymmetries aren't introduced simply due to variable name length.

HarrisonGrodin avatar Dec 31 '21 18:12 HarrisonGrodin

Perhaps also:

type t =
  { apple : t
  , watermelon : t
  , orange : t
  }

(* formatted to *)
type t =
  { apple      : t
  , watermelon : t
  , orange     : t
  }

HarrisonGrodin avatar Dec 31 '21 20:12 HarrisonGrodin

@HarrisonGrodin I'm thinking we could try adding this to the interface

val table: {numRows: int, numCols: int, elem: {row:int, col:int} -> doc} -> doc

I'm picturing that each doc element would be forcibly flattened, so that there are no "multirows" of the resulting layout. Tables themselves would be rigid. And, we would throw a runtime error if an element of a table is rigid (disallowing e.g. nested tables but of course other problematic elements as well).

I don't think this would be too difficult to implement. But it leaves the question of whether or not it is usable for writing pretty cases. One hangup is that you would need to be careful to not put possibly rigid things inside a table. Thoughts?

shwestrick avatar Jan 04 '22 15:01 shwestrick

Here's one place where this would be a pain.

val x : foo
val y : bar longbar very long bar
          definitely needs multiple
          lines bar
val z : baz

Essentially this is a case of "wrapping" within a cell, but of course it's not the same semantics as table cell wrapping.

shwestrick avatar Jan 04 '22 16:01 shwestrick

And, we would throw a runtime error if an element of a table is rigid (disallowing e.g. nested tables but of course other problematic elements as well).

This seems problematic to me; it seems important that all code have at least some autoformatted output.

Forcible flattening sounds somewhat reasonable, although this probably shouldn't apply to the last doc on a line. For example:

(* should align ='s *)
val pat1 = exp1
and pat2 = exp2

val x   =
  case bar of
    nil     => 0
  | x :: xs => x + prod xs
and foo = exp2

A similar issue shows up for many other and-able keywords, such as fun, type, structure, and datatype, all of which can have multi-line/possibly rigid docs at the end of a row.

HarrisonGrodin avatar Jan 04 '22 18:01 HarrisonGrodin

Oh, just realized - we probably want nested tables, as long as they're nested only at the end of another table, in cases like this:

type short            =
  { label1 : t1
  , label2 : t2
  }
and  veryveryverylong =
  { label3 : t3
  , label4 : t4
  , label5 : t5
  }

and this:

case foo of
  pat       => 1
| longpat   => (
    case bar of
      pat       => 2
    | longpat   => 3
    | longerpat => 4
  )
| longerpat => 5

HarrisonGrodin avatar Jan 04 '22 18:01 HarrisonGrodin

And, we would throw a runtime error if an element of a table is rigid (disallowing e.g. nested tables but of course other problematic elements as well).

This seems problematic to me; it seems important that all code have at least some autoformatted output.

Well, keep in mind these are implementation details. It restricts how you use tables, not what the pretty-printer can do. If there was a runtime error due to rigid inside table, that would be a bug in the implementation of the pretty printer.

Forcible flattening sounds somewhat reasonable, although this probably shouldn't apply to the last doc on a line.

You can do this without changing the table definition. If the last doc shouldn't be flattened, then we can just take it out of the table, and prettify it separately.

type short            =
  { label1 : t1
  , label2 : t2
  }
and  veryveryverylong =
  { label3 : t3
  , label4 : t4
  , label5 : t5
  }

More of an aesthetics point, but I think this is a great example of where we need to be careful with vertical alignment. The first equals definitely should not be way out there like that ;)

shwestrick avatar Jan 04 '22 18:01 shwestrick

If there was a runtime error due to rigid inside table, that would be a bug in the implementation of the pretty printer.

Ohh, I see what you mean. :+1:

You can do this without changing the table definition. If the last doc shouldn't be flattened, then we can just take it out of the table, and prettify it separately.

Not sure I follow this - if the last doc in each row shouldn't be flattened, how could we insert it after each table row if the table was a single compound doc?

The first equals definitely should not be way out there like that ;)

Hmm... I can't decide if I agree or not! I like the principle that if definitions are given with and, they're "combined", and thus asymmetries induced due to name lengths should be modded out by; this means that programmers feel comfortable using any names, rather than trying to align word lengths to bring out the symmetry (e.g., hi/lo instead of high/low). My instinct is that it's unusual that you would have short and veryveryverylong defined together via and, but if you do, it makes sense for the ='s to be aligned.

HarrisonGrodin avatar Jan 04 '22 19:01 HarrisonGrodin

Another case for potential alignment I was just thinking about - of in datatype declarations. For example:

datatype foo =
  Apple of int
| Pineapple of string

(* should perhaps be formatted to *)
datatype foo =
  Apple     of int
| Pineapple of string

I think this is probably sensible, since it's dual to aligning the :s in a record definition.

HarrisonGrodin avatar Jan 04 '22 19:01 HarrisonGrodin

You can do this without changing the table definition. If the last doc shouldn't be flattened, then we can just take it out of the table, and prettify it separately.

Not sure I follow this - if the last doc in each row shouldn't be flattened, how could we insert it after each table row if the table was a single compound doc?

You would need to have in hand the last doc of the table anyway, in order to define what the elements of the table are. So all you need to do is just not give that particular doc to the table, and instead do something like beside (mytable, makePretty last).

it makes sense for the ='s to be aligned.

I think I agree only if the =s are touching, i.e. no extra rows in between. Otherwise, the symmetry is lost.

Actually, this might be a nice general principle: vertical alignment only when the aligned tokens are on adjacent lines.

shwestrick avatar Jan 04 '22 20:01 shwestrick

You would need to have in hand the last doc of the table anyway, in order to define what the elements of the table are. So all you need to do is just not give that particular doc to the table, and instead do something like beside (mytable, makePretty last).

Think I might be missing something here; given last1 through lastn, we'd want beside (row1, last1) $$ ... $$ beside (rown, lastn), right?

I think I agree only if the =s are touching, i.e. no extra rows in between. Otherwise, the symmetry is lost.

Actually, this might be a nice general principle: vertical alignment only when the aligned tokens are on adjacent lines.

Hmm, that's interesting; I think I'm okay with either. (I still feel a bit of the symmetry across multiple lines, but your suggestion might be the less obscure style.) It would make table formatting easier, I suppose (given multiline rows, just display in sequence with no alignment?). What should happen if some of the =s are touching but not others, though? e.g.,

type foo = int
and  short =
  { label1 : t1
  , label2 : t2
  }
and  veryveryverylong =
  { label3 : t3
  , label4 : t4
  , label5 : t5
  }

Should the first two =s be aligned; all three; none? Similar issue with =>'s in case expressions:

case foo of
  pat =>
    long multiline expression
| longpat => 1
| longerpat => 2

I'm perfectly happy with always aligning the =>s no matter what (and I'd prefer this to no alignment), since on a large scale it suggests that the case arms are parallel to one another, even if one happens to be particularly long. Perhaps there's a clever middle ground, though?

HarrisonGrodin avatar Jan 04 '22 21:01 HarrisonGrodin