dhall-haskell
dhall-haskell copied to clipboard
dhall-format removes comments
Example file:
--- This is dhall file
{ foo = "jax" }
The Output it gives:
{ foo = "jax" }
Going through the code, it seems a known issue. :) Anyway keeping this open.
One thing I can do as an immediate workaround is to have the formatter preserve leading whitespace and comments at the beginning of the file. Then if you need comments for a subexpression you can at least have the workaround of splitting it out into another file
@Gabriel439 What's the main challenge to tackle in order to fix this? As far as I remember the main problem is that it'd be hard to associate comment fragments to other AST nodes, right? (as in: a comment line that comes before some code goes with the next node, but what to do in case of a comment fragment on the same line? Also how about the alignments? And does this mean that comments would slow down all operations, since it's a whole lot of packing-unpacking dummy AST comment nodes?)
The GHC approach is not to out comments in the AST, but have a separate map Map (NodeType, SrcLoc) Annotation. So as you traverse the AST you have SrcLocs and can find annotations for the node you're at. That's one approach. Or we could have a trees that grow approach and erase comments from the AST for anything but formatting.
On Thu, 1 Nov 2018, 9:53 am Fabrizio Ferrai <[email protected] wrote:
@Gabriel439 https://github.com/Gabriel439 What's the main challenge to tackle in order to fix this? As far as I remember the main problem is that it'd be hard to associate comment fragments to other AST nodes, right? (as in: a comment line that comes before some code goes with the next node, but what to do in case of a comment fragment on the same line? Also how about the alignments? And does this mean that comments would slow down all operations, since it's a whole lot of packing-unpacking dummy AST comment nodes?)
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dhall-lang/dhall-haskell/issues/145#issuecomment-434989843, or mute the thread https://github.com/notifications/unsubscribe-auth/AABRjlJH8aGjbXdkKSoOcAC8gEWfHHQ6ks5uqsSsgaJpZM4PoYNA .
There is also a third solution, which is a "low-tech" formatter that doesn't actually parse the AST but rather just scans the text and formats as it goes. I believe this is how go fmt
works.
However, I think the simplest and most robust solution is the second one that @ocharles mentioned which is just to preserve parsed whitespace in the syntax tree using the Noted
constructors.
That still doesn't get us all the way, though, because intelligently preserving comments when formatting code is still difficult even when your AST preserves whitespace and comments. Here are some pathological cases to consider when designing a comment-preserving formatting algorithm:
-
The formatter wants to format an expression as one line since it's less than 80 characters, but there is a line comment in the way:
{ x = 1 -- ! , y = 2 }
-
Same as the previous example, but now there is a multi-line comment in the way:
{ x = 1 {- ! ! -} , y = 2 }
-
The formatter wants to reindent a value but doing so might break the indentation of a multi-line comment:
[ 1 , 2 ... , 99 {- ! ! -} ]
-
Same as the previous example, except that te user wrote the multi-line comment using multiple single-line comments (i.e.
--
), so the formatter doesn't know that they are supposed to be indented together:[ 1 , 2 ... , 99 -- ! -- ! ]
-
User inserts comments preceding syntactic elements that we want to horizontally align:
[ 1 , 2 ... {- ! -} , 99 ]
-
User adds a comment that goes beyond the desired column limit:
1 -- !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
-
User adds whitespace before a comment pushing it beyond the desired column limit:
1 -- !
I'd like to have comments on record fields if possible. This doesn't work right now.
let Block
: Type
= { id :
Text
-- Used in radio selects, checkbox selects
, label :
Text
, fieldType :
Text
, options :
List { id : Text, label : Text }
}
gets nuked.
If we compare different styles and techniques of formatting: one aspect of gofmt is that is preserves line breaks. So it may compact many linebraks to /n/n
, but other than that, the user has some freedom on formatting. gofmt then mainly manages indentation, not breaks.
I think this is useful. The fact that some dhall maps are formatted in one line and some are not (based on line lengths) is not something I enjoy.
This may warrant a separate issue, but if you want to preserve linebreaks and want to go for AST style formatting and include comments, linebreaks would also have to be added to AST.
@feliksik: The Haskell AST does preserve the original source code for all nodes, including whitespace/newlines, but it's still not clear to me if only dealing with indentation is enough.
For example, one difference between Dhall and Go is that Dhall is less whitespace-sensitive than Go is. For example, in Go, this is syntactically valid:
func main() {
fmt.Println("Hello, world!")
}
... but this is not valid
func main()
{
fmt.Println("Hello, world!")
}
If the latter were valid then Go would have to delete the newline in between main()
and the opening brace to format the code according to the standard style
Because Dhall is not as restrictive as Go, it would have to accommodate all sorts of weirdness if it took greater care to preserve the original newlines.
@Gabriel439 thank you for your response.
Because Dhall is not as restrictive as Go, it would have to accommodate all sorts of weirdness if it took greater care to preserve the original newlines.
This is a good point! Indeed, only dealing with indentation may not be enough to make it great. If I correctly understand, you would need some rules (that are eventually arbitrary), and these would need to be a bit more sophisticated than with go.
I tried to get this straight for myself, so I can just as well share the examples I made:
This would be fine (it's current formatting):
λ(x : Text) → { a = x, b = x }
As would possibly:
λ(x : Text) →
{ a = x
, b = x
}
but this would probably be considered too weird:
λ(
x : Text) →
{
a = x,
b = x
}
But you may consider such rules are undesirable (apart from the existing 80-char-line rewrite logic).
Note that go also implements such heuristic, a bit: The following is gofmt'ed, but one newline would be removed if the comment is removed:
package main
import "fmt"
func main() {
a := map[string]string{"k1": "v1", "k2": "v2"}
b := map[string]string{
"k1": "v1", "k2": "v2",
}
c := map[string]string{
"k1": "v1",
"k2": "v2",
}
d := map[string]string{
"k1": "v1", "k2": // note that this newline WOULD be removed if you remove this comment
"v2",
}
e := map[string]string{
"k1": "v1",
"k2": "v2", // extra line before
}
fmt.Println(
"Hello, world!", a, b,
c,
d, e,
)
}
So to summarize: if I get it right, there is 2 taking home points here:
- line rewrite heuristics can be formulated, and it's arbitrary how far you want to take this
- the main point of this ticket (preserving comments) could possibly be implemented (and made easier?) by only respecting newlines that are preceded by a comment (
--
).
@feliksik: Is there a document or code describing the go fmt
algorithm? It would help a lot if I could crib from that when working on the dhall format
equivalent
Hmm I am not aware of such a document. The code is not too long, though: https://golang.org/src/cmd/gofmt/gofmt.go . I didn't study it yet, though.
I did find http://journal.stuffwithstuff.com/2015/09/08/the-hardest-program-ive-ever-written/ , which elaborates how hard this problem can become, based on the requirements. But this aims for much more sophisticated results.
By all means, don't let this spoil the work that can be done on preserving comments in the shorter term. It will be of great value, and much more important than the newlines.
Personally, I do like the current dhall fmt behaviour, especially since dhall is by design a line-break and leading whitespace insensitive language.
But I think this discussion is derailing the original issue too much and should move to a different issue.
What about starting with simple rules for preserving comments such as "only comments in their own lines are preserved"?
I think this would go a long way, at least for documenting functions and types at the top level
I really liked the idea of preserving newlines that follow a comment. I'm currently trying to see how easy it would be to implement
Alright, here's the rough idea I have for how to preserve one trailing comment per AST node.
First, some background: every node in the AST currently keeps track of its source code, including trailing whitespace (but not leading whitespace).
So what we can do for each node is to check the trailing whitespace and behave differently under the following three conditions:
-
No trailing comment
In this case, don't preserve the trailing whitespace for that node at all, which is the current behavior
-
Trailing comment beginning on the same line
Preserve the comment and align it against the right-hand side of the expression (indenting/dedenting the comment as necessary if it's a multi-line comment)
For example, this expression:
let x = 1 {- Foo Bar -} in x
... would be formatted as:
let x = 1 {- Foo Bar -} in x
-
Trailing comment beginning on another line
Preserve the comment and align it below the expression to the left-hand side (indent/dedenting the comment as necessary)
For example, this expression:
let x = 1 {- Foo Bar -} in x
... would be formatted as:
let x = 1 {- Foo Bar -} in x
We could also add special cases for preserving comments right before a let
binding or a record/union key since we can preserve their interior whitespace, too.
Alright, I made some progress on this here by fixing dhall format
to at least preserve comments inside of a let
binding (but not immediately before):
https://github.com/dhall-lang/dhall-haskell/pull/1273
So, for example, it will preserve the comment if you do this:
let x = 1
let {- Example
comment
-}
y = 1
in x + y
... but it will not preserve the comment if you do this:
let x = 1
{- Example
comment
-}
let y = 1
in x + y
To provide a quick update: the general tack I suggested in my previous comment did not work. The issue is that the same trailing whitespace would be preserved by multiple nodes in the syntax tree, so it was difficult to get a preserved comment to show up exactly once (i.e. there were many errors with comments that were duplicated or missing).
However, the trick that does work (which is partially implemented in the above pull request) is just adding Src
spans directly to the Expr
constructors everywhere that there is whitespace within the grammar. The above pull request only implements it for Let
but I can slowly add support for preserving comments to more constructors as time goes on.
For now, I'm only doing this for Let
to minimize disruption to downstream libraries and I'll try to add support for a few more constructors with each release to ease the migration process for people using the Expr
API. Once I've covered all constructors then we can close out this issue.
In case anyone's interested in working on this, it should be much easier to preserve more comments once https://github.com/dhall-lang/dhall-haskell/pull/1483 is merged:
- Add a
Maybe s
field to one of theExpr
constructors, similarly to what we have inBinding
. - Wrap the corresponding
whitespace
ornonemptyWhitespace
parser inDhall.Parser.Expression
withsrc
, just like what we do forBinding
. - Follow the type errors and update the pretty printer to render the comment.
- Add a format test.
Step 3 is a bit onerous, but it shouldn't be difficult.
I had one of those "ah-ha! why didn't I think of this earlier?" moments this morning of a simple work-around, and since nobody's published it yet, I'll put it here.
Instead of:
{ field =
-- foo/bar is for bazzes
"value"
}
write:
{ field =
let note = "foo/bar is for bazzes"
in "value"
}
Of course, this only works for workflows which aren't running dhall lint
regularly on their codebases, so it's an anti-pattern because dhall lint
should be generally encouraged. But hey, works for me.
@ari-becker: Another work-around that dhall lint
will not remove is this:
{ field =
let -- foo/bar is for bazzes
result = "value"
in result
}
Now that https://github.com/dhall-lang/dhall-haskell/pull/1908 is merged, the parser output also contains the comments that precede labels in records and record types. e.g. the A and B comments in these examples:
{ {- A -} x = 0, {- B -} y = 1 }
{ {- A -}
x : T
, {- B -}
y : U
}
However the pretty-printer still needs to be updated to output these comments. @german1608's initial work (including tests) on this is on this branch, mixed with parts of #1908 and #1926.
If there are no other volunteers, I'd take a stab at finishing the the pretty-printer update. :)
It should also be also be fairly simple to retain the comments between labels and the field value or type now, e.g.
{ x {- A -} = {- B -} 0 }
and
{ x {- A -} : {- B -} T }
@sjakobi this issue hasn't been resolved in1.34, has it ? I was just very exited believing record comments would not be removed by the linter anymore. My hope has been high on this one because some of our users have expressed their disappointment when they realize their comments were garbaged on record fields in a military way.
@PierreR dhall format
doesn't preserve those comments on 1.34.0. The work to record those on the AST is done, but the formatter needs an update to not remove them.
@german1608 thanks for the info. Will look forward to it. Amazing work !
@PierreR A lot of preparatory work has been done, most importantly https://github.com/dhall-lang/dhall-haskell/issues/1465.
I hope to get to the pretty-printer next week. But I also wouldn't mind if someone else would take over.
Now that #2021 has been merged, dhall format
can preserve comments in several positions in record literals and record types. The expression below demonstrates all the positions where comments are now preserved:
{- Header
-}
let {- A -} letBinding
{- B -} : {- C -} T
= {- D -} x
let recordType =
{ {- E -}
x
{- F -}
: {- G -}
T
}
let pun = x
let record =
{ {- H -}
simple
{- I -}
= {- J -}
z
, {- K -}
pun
{- L -}
, {- M -}
dotted
{- N -}
. {- O -}
fields
{- P -}
= {- R -}
x
}
in {=}
I assume that these changes will be included in v1.35.0 (#2016).
As @TristanCacqueray has pointed out in #2021, it would be relatively easy to preserve comments in union types now. E.g.
< {- A -} X {- B -} : {- C -} Y >
Also, @german1608 has already laid the ground work for preserving comments on
- functions:
λ({- A -} a {- B -} : {- C -} T) -> e
- function types:
∀({- A -} a {- B -} : {- C -} T) -> e
- field selections:
e . {- A -} x {- B -}
@sjakobi: Yes, I plan to cut the 1.35.0
branch tonight, which will incorporate the formatting improvements
EDIT: Retracting my original comment as it was badly formulated and sounded like a complaint which I didn't intend.
What I meant to say was "It would be great to get this fully resolved to facilitate greater adoption." :smile:
@hanshoglund: I think my main question is: which places do you still need to preserve comments? The most recent release preserves comments on let
bindings and record value/type fields
@Gabriel439 That is a great improvement, but to minimize negative first-user reactions I think it is still crucial to ensure that all comments accepted by the parser are also preserved by the formatter. Maybe the solution is to make the parser accept fewer comments?
I'm sure this is being worked on, just wanted to add another data point :D.
I can for sure say that it has also caused some weird reactions on our team and sth. we had to explicitly educate everyone about. To the point that we had long discussions on figuring out the best "tricks" like e.g. artificial let bindings for how we can use comments to document our dhall types. This was before the most recent improvements in 1.35+ and introduction of the dhall docs
(which is still very rough).
Anyhow, I'm all for a "canonical formatter" and the benefits of this hugely outweigh the cons. In particular I also appreciate that the formatter enforces e.g. comment whitespace/alignment to some degree.
I like the idea proposed by @hanshoglund-da to think about how the "write comment & format file" experience can be improved to yield less surprising results.