prql
prql copied to clipboard
Add comments to `fmt`
What's up?
We discussed comments (and associated aesthetic items, such as significant newlines) on Discord. A couple of options for how to do this:
- Add to the AST — change the Parser to parse comments, attaching them to AST nodes. For example, there would be a
commentsfield on the Expr struct containing the comments before the expression.- This keeps context local — each node can format itself without requiring the context of the full query
- But it would mean lots of changes to the parser. Some nodes aren't
Exprand so would require a different design (e.g. a comment above an annotation wouldn't fit anywhere; similarly we'd need to add this toStmttoo.)
- Require passing in the source into
WriteSource, use the source to find and write the comments while writing the query.- This is less dev work, because comments don't need to go through the PL.
- But it means formatting something requires global context, which is less modular and harder to maintain
IIUC these are the main two approaches. There are variations — for example having comments part of a CST but not AST; though in our case we don't have a CST.
Any thoughts?
Discussed on the dev call — let's go for the second approach — "Require passing in the source into WriteSource"
What is the status on this? Is someone actively working on getting comments working?
Ah, I haven't been keeping this issue up-to-date, thanks for the ping @m-span .
Relevant links: https://github.com/PRQL/prql/pull/4397#issuecomment-2187146337, https://github.com/PRQL/prql/pull/4639
Briefly:
- The existing formatter uses PL. But PL doesn't have comments, which are only stored in tokens
- We could rewrite the formatter to use LR tokens, referencing the PL to understand where things should go. That would be lots of rewriting
- I tried two different approaches to avoid that work (each referenced above)
- Format the PL into PRQL (as before), and attempt to write interspersed comments by referencing LR Tokens. This is hard, since the writer needs to carry a lot of state (in particular, because the PL is nested, there are multiple PL Expr that are contiguous to a comment LR token, so we don't want to write out a comment token whenever a comment follows an Expr)
- Add comments into the PL AST. This almost works, but fails on one narrow case, and I can see some other looming issues
I'm not confident on the best next steps. There's a mini-goal of getting DocComments in the PL, which are much easier than comments, and uses a subset of #4639. Then I think we need to decide whether to a) try and take one of those approaches and add enough hacks that it works or b) rewrite the formatter from the perspective of LR tokens.