prql icon indicating copy to clipboard operation
prql copied to clipboard

Getting `fmt` to the finish-line

Open max-sixty opened this issue 2 years ago • 13 comments

What's up?

After @aljazerzen built the foundations of this, I have filled a few gaps, and now all the examples format into correct PRQL. But we still have a few gaps in getting to something we can use everywhere.

In order to roll this out fully, it's quite important to be close to 100%, since all PRQL will be formatted this way — in the book, whenever someone saves a file in VS Code, whenever someone runs pre-commit (both in our repo and others'), etc. Without being close to 100%, it's not that instrumentally useful.

  • [ ] We currently remove comments!
  • [x] Do we want to elide parentheses around function calls in {x = (sum foo)} (from https://github.com/PRQL/prql/pull/2803#discussion_r1231216571). This in the docs here. It might require understanding the grandparent node (need to think more)
  • [x] module breaks, which breaks the standard library (though I actually thought we were only going to do files as modules @aljazerzen ?) https://github.com/PRQL/prql/pull/2949
  • [x] Do we want from t=tracks or from t = tracks? I marginally preferred the former — given that we have fairly few separators, having tighter expressions groups things better. (but this is not a strong view, maybe +0.3, and only aesthetics) https://github.com/PRQL/prql/pull/2779
  • [x] Do we want to break inner transforms at all? e.g. this is a lot on one line. But I think my reaction is mostly based on what I'm used to, and I don't think there would be a good rule that lets us handle inner transforms differently. Though we could linebreak earlier (e.g. line break at 40) https://github.com/PRQL/prql/blob/31c3e71b55a0c6ca383e9c1367438352f024fa6b/prql-compiler/tests/integration/snapshots/integration__fmt@distinct_on.prql.snap#L8 https://github.com/PRQL/prql/pull/2950
  • [x] {(-foo)} has extra parentheses, : https://github.com/PRQL/prql/blob/31c3e71b55a0c6ca383e9c1367438352f024fa6b/prql-compiler/tests/integration/snapshots/integration__fmt@distinct_on.prql.snap#L8-L9 — https://github.com/PRQL/prql/pull/2803

max-sixty avatar Jun 07 '23 17:06 max-sixty

module breaks, which breaks the standard library

We use it only internally, because I don't want to have 7 different files with 5 lines in each.

Do we want from t=tracks or from t = tracks?

I'm +0.5 on t=tracks.

Do we want to break inner transforms at all?

Right now, they are broken based on width only. We can add other heuristics, for example "more than two elements of a pipeline will always break into multiple lines".

aljazerzen avatar Jun 07 '23 18:06 aljazerzen

I'm +0.5 on t=tracks.

#2779

max-sixty avatar Jun 10 '23 07:06 max-sixty

One big gap I just realized — comments are erased!

This is something we have to think about — we want to retain some aesthetics; for example:

from t  # comment about `t`
# comment about the select
select f

...these two comments are in the same place as far as AST nodes go — but can't be treated the same.

max-sixty avatar Jun 15 '23 06:06 max-sixty

Jup. My plan was to do something similar to what rustfmt does. But that's hard, so I didn't - yet.

aljazerzen avatar Jun 15 '23 10:06 aljazerzen

Jup. My plan was to do something similar to what rustfmt does. But that's hard, so I didn't - yet.

Interesting link, thanks for sharing. One option would be — since we're also in control of the parser — to have comments in the initial AST. During compilation, we could then run through a cheap remove_comments function...

max-sixty avatar Jun 15 '23 19:06 max-sixty

To share my thinking — part of the reason for spending time here is: without getting all the way there, it's not that instrumentally useful — to auto-format files, we need it to produce reasonable PRQL.

It doesn't have to be completely perfect — we're OK with small line-length changes etc — but it needs to not lose information (e.g. comments), and be acceptable PRQL.

max-sixty avatar Jun 15 '23 19:06 max-sixty

(I edited this a few times, was working through it in my own mind...)

I just added this to the description:

  • Do we want to elide parentheses around function calls in {x = (sum foo)} (from https://github.com/PRQL/prql/pull/2803#discussion_r1231216571). This in the docs here. It might require understanding the grandparent node (need to think more)

As a reminder, the reason we require parentheses around function calls in select x = (sum foo) but not in {x = sum foo} is that the latter has an = after the first ident, and so isn't a function call.

x is aliased to sum foo

select x = (sum foo)

...but here, x is aliased to sum and then select receives a single arg of foo...

select x = sum foo  # wrong!

e is aliased to employees (not to employees (==id))

join e=employees (==id)

Here, x can't be a function call, since after the initial ident there's a =

derive {
  x = sum foo
}

I'm not sure it's possible to elide the parentheses in both without changing the syntax between an assign & alias, or resolving much later in the compilation pipeline

max-sixty avatar Jun 15 '23 19:06 max-sixty

To have a proper formatter, we do need to fix the comments. Your approach with having comments in AST would probably not work well, because it would move comments around, as AST would not capture where exactly the comment was (on the same line as code, in a new line, maybe indented?). This is why I think the rustfmt's approach would be better in the long term.

But, we don't need a proper formatter - at least not now. In the current state, the formatter can have the function of "the idiomatic PRQL standard" - i.e. the definition of what we want the idiomatic PRQL to look like. It can also be used to format book snippets (if they don't contain comments).

aljazerzen avatar Jun 16 '23 06:06 aljazerzen

because it would move comments around, as AST would not capture where exactly the comment was (on the same line as code, in a new line, maybe indented?). This is why I think the rustfmt's approach would be better in the long term.

Hmmm, if I look at the output of rustfmt, it seems to retain:

  • Same line vs. next line
  • Whether there's a linebreak

...but that's all — can't have different identations, can't have more than one linebreak!

(this might still be too much to store in the AST)


But, we don't need a proper formatter - at least not now. In the current state, the formatter can have the function of "the idiomatic PRQL standard" - i.e. the definition of what we want the idiomatic PRQL to look like. It can also be used to format book snippets (if they don't contain comments).

Sure, that's nice :) But much less impactful than being able to format on every save!

max-sixty avatar Jun 16 '23 06:06 max-sixty

Do we want to break inner transforms at all? e.g. this is a lot on one line. But I think my reaction is mostly based on what I'm used to, and I don't think there would be a good rule that lets us handle inner transforms differently. Though we could linebreak earlier

Codegen framework is working only partially, I'm fixing it.

aljazerzen avatar Jul 04 '23 14:07 aljazerzen

FYI I looked at adding comments to the lexer (not the parser), so we could use that to grab the comments. (#4094 was some of this)

It's not difficult in the lexer, but it would involve having lots of .then_ignore(comment.or_not()) throughout the parser (which isn't a herculean effort, but I didn't want to do without being confident we wanted to go this path). Another option would be to remove comments from the lexer output, before passing it into the Parser.

If we can get comments working, they we're in spitting distance of prqlc fmt working. Once it's working then that'll be a big achievement!

max-sixty avatar Jan 16 '24 17:01 max-sixty

For future reference: I've posted a few screenshots with general outline of how it is possible to implement formatting with comments on Discord.

Also, I've ticked off a few things from the checklist:

  • module is now properly formatted,
  • we decided we want from t = tracks
  • inner pipelines now get split onto multiple lines (when needed)

aljazerzen avatar Jan 16 '24 18:01 aljazerzen

I've done a few PRs to add important whitespace to the lexer — I think now complete. We still need to decide how we add that to the output (@aljazerzen gave a few options on discord).


Another thing we might want to add is a way of turning off formatting. That could be as simple as a comment such as # fmt:off, and plausibly # fmt:on.

max-sixty avatar Jan 18 '24 23:01 max-sixty