dhall-haskell dhall-format removes comments

Example file:

--- This is dhall file
{ foo = "jax" }

The Output it gives:

{ foo = "jax" }

Sep 29 '17 08:09 psibi

Going through the code, it seems a known issue. :) Anyway keeping this open.

Sep 29 '17 08:09 psibi

One thing I can do as an immediate workaround is to have the formatter preserve leading whitespace and comments at the beginning of the file. Then if you need comments for a subexpression you can at least have the workaround of splitting it out into another file

Sep 29 '17 13:09 Gabriella439

@Gabriel439 What's the main challenge to tackle in order to fix this? As far as I remember the main problem is that it'd be hard to associate comment fragments to other AST nodes, right? (as in: a comment line that comes before some code goes with the next node, but what to do in case of a comment fragment on the same line? Also how about the alignments? And does this mean that comments would slow down all operations, since it's a whole lot of packing-unpacking dummy AST comment nodes?)

Nov 01 '18 09:11 f-f

The GHC approach is not to out comments in the AST, but have a separate map Map (NodeType, SrcLoc) Annotation. So as you traverse the AST you have SrcLocs and can find annotations for the node you're at. That's one approach. Or we could have a trees that grow approach and erase comments from the AST for anything but formatting.

On Thu, 1 Nov 2018, 9:53 am Fabrizio Ferrai <[email protected] wrote:

@Gabriel439 https://github.com/Gabriel439 What's the main challenge to tackle in order to fix this? As far as I remember the main problem is that it'd be hard to associate comment fragments to other AST nodes, right? (as in: a comment line that comes before some code goes with the next node, but what to do in case of a comment fragment on the same line? Also how about the alignments? And does this mean that comments would slow down all operations, since it's a whole lot of packing-unpacking dummy AST comment nodes?)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dhall-lang/dhall-haskell/issues/145#issuecomment-434989843, or mute the thread https://github.com/notifications/unsubscribe-auth/AABRjlJH8aGjbXdkKSoOcAC8gEWfHHQ6ks5uqsSsgaJpZM4PoYNA .

Nov 01 '18 10:11 ocharles

There is also a third solution, which is a "low-tech" formatter that doesn't actually parse the AST but rather just scans the text and formats as it goes. I believe this is how go fmt works.

However, I think the simplest and most robust solution is the second one that @ocharles mentioned which is just to preserve parsed whitespace in the syntax tree using the Noted constructors.

That still doesn't get us all the way, though, because intelligently preserving comments when formatting code is still difficult even when your AST preserves whitespace and comments. Here are some pathological cases to consider when designing a comment-preserving formatting algorithm:

The formatter wants to format an expression as one line since it's less than 80 characters, but there is a line comment in the way:
```
{ x = 1  -- !
, y = 2
}
```
Same as the previous example, but now there is a multi-line comment in the way:
```
{ x = 1  {- !
            !
         -}
, y = 2
}
```
The formatter wants to reindent a value but doing so might break the indentation of a multi-line comment:
```
[ 1
, 2
...
,     99 {- !
            !
         -}
]
```
Same as the previous example, except that te user wrote the multi-line comment using multiple single-line comments (i.e. -- ), so the formatter doesn't know that they are supposed to be indented together:
```
[ 1
, 2
...
,     99 -- !
         -- !
]
```
User inserts comments preceding syntactic elements that we want to horizontally align:
```
[ 1
, 2
...
{- ! -} , 99
]
```

User adds a comment that goes beyond the desired column limit:

1 -- !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

User adds whitespace before a comment pushing it beyond the desired column limit:

1                                                                                          -- !

Nov 01 '18 14:11 Gabriella439

I'd like to have comments on record fields if possible. This doesn't work right now.

let Block
	: Type
	= { id :
		  Text
		  -- Used in radio selects, checkbox selects
	  , label :
		  Text
	  , fieldType :
		  Text
	  , options :
		  List { id : Text, label : Text }
	  }

gets nuked.

Mar 17 '19 02:03 macalinao

If we compare different styles and techniques of formatting: one aspect of gofmt is that is preserves line breaks. So it may compact many linebraks to /n/n, but other than that, the user has some freedom on formatting. gofmt then mainly manages indentation, not breaks.

I think this is useful. The fact that some dhall maps are formatted in one line and some are not (based on line lengths) is not something I enjoy.

This may warrant a separate issue, but if you want to preserve linebreaks and want to go for AST style formatting and include comments, linebreaks would also have to be added to AST.

Jul 06 '19 08:07 feliksik

@feliksik: The Haskell AST does preserve the original source code for all nodes, including whitespace/newlines, but it's still not clear to me if only dealing with indentation is enough.

For example, one difference between Dhall and Go is that Dhall is less whitespace-sensitive than Go is. For example, in Go, this is syntactically valid:

func main() {
	fmt.Println("Hello, world!")
}

... but this is not valid

func main()
{
	fmt.Println("Hello, world!")
}

If the latter were valid then Go would have to delete the newline in between main() and the opening brace to format the code according to the standard style

Because Dhall is not as restrictive as Go, it would have to accommodate all sorts of weirdness if it took greater care to preserve the original newlines.

Jul 06 '19 22:07 Gabriella439

@Gabriel439 thank you for your response.

Because Dhall is not as restrictive as Go, it would have to accommodate all sorts of weirdness if it took greater care to preserve the original newlines.

This is a good point! Indeed, only dealing with indentation may not be enough to make it great. If I correctly understand, you would need some rules (that are eventually arbitrary), and these would need to be a bit more sophisticated than with go.

I tried to get this straight for myself, so I can just as well share the examples I made:

This would be fine (it's current formatting):

λ(x : Text) → { a = x, b = x }

As would possibly:

λ(x : Text) →
  { a = x
  , b = x
  }

but this would probably be considered too weird:

λ(
  x : Text) →
  {
   a = x,
   b = x
  }

But you may consider such rules are undesirable (apart from the existing 80-char-line rewrite logic).

Note that go also implements such heuristic, a bit: The following is gofmt'ed, but one newline would be removed if the comment is removed:

package main

import "fmt"

func main() {
        a := map[string]string{"k1": "v1", "k2": "v2"}

        b := map[string]string{
                "k1": "v1", "k2": "v2",
        }

        c := map[string]string{
                "k1": "v1",
                "k2": "v2",
        }
        d := map[string]string{
                "k1": "v1", "k2": // note that this newline WOULD be removed if you remove this comment
                "v2",
        }
        e := map[string]string{
                "k1": "v1",

                "k2": "v2", // extra line before
        }

        fmt.Println(
                "Hello, world!", a, b,

                c,
                d, e,
        )
}

So to summarize: if I get it right, there is 2 taking home points here:

line rewrite heuristics can be formulated, and it's arbitrary how far you want to take this
the main point of this ticket (preserving comments) could possibly be implemented (and made easier?) by only respecting newlines that are preceded by a comment (--).

Jul 07 '19 18:07 feliksik

@feliksik: Is there a document or code describing the go fmt algorithm? It would help a lot if I could crib from that when working on the dhall format equivalent

Jul 07 '19 19:07 Gabriella439

Hmm I am not aware of such a document. The code is not too long, though: https://golang.org/src/cmd/gofmt/gofmt.go . I didn't study it yet, though.

I did find http://journal.stuffwithstuff.com/2015/09/08/the-hardest-program-ive-ever-written/ , which elaborates how hard this problem can become, based on the requirements. But this aims for much more sophisticated results.

By all means, don't let this spoil the work that can be done on preserving comments in the shorter term. It will be of great value, and much more important than the newlines.

Jul 07 '19 21:07 feliksik

Personally, I do like the current dhall fmt behaviour, especially since dhall is by design a line-break and leading whitespace insensitive language.

But I think this discussion is derailing the original issue too much and should move to a different issue.

Jul 09 '19 11:07 Profpatsch

What about starting with simple rules for preserving comments such as "only comments in their own lines are preserved"?

I think this would go a long way, at least for documenting functions and types at the top level

Jul 09 '19 15:07 lorenzo

I really liked the idea of preserving newlines that follow a comment. I'm currently trying to see how easy it would be to implement

Jul 09 '19 15:07 Gabriella439

Alright, here's the rough idea I have for how to preserve one trailing comment per AST node.

First, some background: every node in the AST currently keeps track of its source code, including trailing whitespace (but not leading whitespace).

So what we can do for each node is to check the trailing whitespace and behave differently under the following three conditions:

No trailing comment

In this case, don't preserve the trailing whitespace for that node at all, which is the current behavior
Trailing comment beginning on the same line

Preserve the comment and align it against the right-hand side of the expression (indenting/dedenting the comment as necessary if it's a multi-line comment)

For example, this expression:
```
let x =   1   {- Foo
                 Bar
              -}



in  x
```
... would be formatted as:
```
let x =
      1 {- Foo
           Bar
        -}

in  x
```
Trailing comment beginning on another line

Preserve the comment and align it below the expression to the left-hand side (indent/dedenting the comment as necessary)

For example, this expression:
```
let x = 1


{- Foo
   Bar
-}

in  x
```
... would be formatted as:
```
let x =
      1
      {- Foo
         Bar
      -}

in  x
```

We could also add special cases for preserving comments right before a let binding or a record/union key since we can preserve their interior whitespace, too.

Jul 09 '19 16:07 Gabriella439

Alright, I made some progress on this here by fixing dhall format to at least preserve comments inside of a let binding (but not immediately before):

https://github.com/dhall-lang/dhall-haskell/pull/1273

So, for example, it will preserve the comment if you do this:

let x = 1

let {- Example
       comment
    -}
    y = 1

in  x + y

... but it will not preserve the comment if you do this:

let x = 1

{- Example
   comment
-}
let y = 1

in  x + y

To provide a quick update: the general tack I suggested in my previous comment did not work. The issue is that the same trailing whitespace would be preserved by multiple nodes in the syntax tree, so it was difficult to get a preserved comment to show up exactly once (i.e. there were many errors with comments that were duplicated or missing).

However, the trick that does work (which is partially implemented in the above pull request) is just adding Src spans directly to the Expr constructors everywhere that there is whitespace within the grammar. The above pull request only implements it for Let but I can slowly add support for preserving comments to more constructors as time goes on.

For now, I'm only doing this for Let to minimize disruption to downstream libraries and I'll try to add support for a few more constructors with each release to ease the migration process for people using the Expr API. Once I've covered all constructors then we can close out this issue.

Sep 02 '19 18:09 Gabriella439

In case anyone's interested in working on this, it should be much easier to preserve more comments once https://github.com/dhall-lang/dhall-haskell/pull/1483 is merged:

Add a Maybe s field to one of the Expr constructors, similarly to what we have in Binding.
Wrap the corresponding whitespace or nonemptyWhitespace parser in Dhall.Parser.Expression with src, just like what we do for Binding.
Follow the type errors and update the pretty printer to render the comment.
Add a format test.

Step 3 is a bit onerous, but it shouldn't be difficult.

Nov 01 '19 14:11 sjakobi

I had one of those "ah-ha! why didn't I think of this earlier?" moments this morning of a simple work-around, and since nobody's published it yet, I'll put it here.

Instead of:

{ field =
    -- foo/bar is for bazzes
    "value"
}

write:

{ field =
    let note = "foo/bar is for bazzes"
    in "value"
}

Of course, this only works for workflows which aren't running dhall lint regularly on their codebases, so it's an anti-pattern because dhall lint should be generally encouraged. But hey, works for me.

Jan 21 '20 08:01 ari-becker

@ari-becker: Another work-around that dhall lint will not remove is this:

{ field =
    let -- foo/bar is for bazzes
        result = "value"

    in  result
}

Jan 21 '20 16:01 Gabriella439

Now that https://github.com/dhall-lang/dhall-haskell/pull/1908 is merged, the parser output also contains the comments that precede labels in records and record types. e.g. the A and B comments in these examples:

{ {- A -} x = 0, {- B -} y = 1 }

{ {- A -}
  x : T
, {- B -}
  y : U
}

However the pretty-printer still needs to be updated to output these comments. @german1608's initial work (including tests) on this is on this branch, mixed with parts of #1908 and #1926.

If there are no other volunteers, I'd take a stab at finishing the the pretty-printer update. :)

It should also be also be fairly simple to retain the comments between labels and the field value or type now, e.g.

{ x {- A -} = {- B -} 0 }

and

{ x {- A -} : {- B -} T }

Jul 18 '20 12:07 sjakobi

@sjakobi this issue hasn't been resolved in1.34, has it ? I was just very exited believing record comments would not be removed by the linter anymore. My hope has been high on this one because some of our users have expressed their disappointment when they realize their comments were garbaged on record fields in a military way.

Aug 04 '20 19:08 PierreR

@PierreR dhall format doesn't preserve those comments on 1.34.0. The work to record those on the AST is done, but the formatter needs an update to not remove them.

Aug 04 '20 19:08 german1608

@german1608 thanks for the info. Will look forward to it. Amazing work !

Aug 04 '20 19:08 PierreR

@PierreR A lot of preparatory work has been done, most importantly https://github.com/dhall-lang/dhall-haskell/issues/1465.

I hope to get to the pretty-printer next week. But I also wouldn't mind if someone else would take over.

Aug 04 '20 23:08 sjakobi

Now that #2021 has been merged, dhall format can preserve comments in several positions in record literals and record types. The expression below demonstrates all the positions where comments are now preserved:

{- Header
-}
let {- A -} letBinding
    {- B -} : {- C -} T
    = {- D -} x

let recordType =
      { {- E -}
        x
        {- F -}
        : {- G -}
          T
      }

let pun = x

let record =
      { {- H -}
        simple
        {- I -}
        = {- J -}
          z
      , {- K -}
        pun
        {- L -}
      ,   {- M -}
          dotted
          {- N -}
        . {- O -}
          fields
          {- P -}
        = {- R -}
          x
      }

in  {=}

I assume that these changes will be included in v1.35.0 (#2016).

As @TristanCacqueray has pointed out in #2021, it would be relatively easy to preserve comments in union types now. E.g.

< {- A -} X {- B -} : {- C -} Y >

Also, @german1608 has already laid the ground work for preserving comments on

functions: λ({- A -} a {- B -} : {- C -} T) -> e
function types: ∀({- A -} a {- B -} : {- C -} T) -> e
field selections: e . {- A -} x {- B -}

Sep 10 '20 17:09 sjakobi

@sjakobi: Yes, I plan to cut the 1.35.0 branch tonight, which will incorporate the formatting improvements

Sep 10 '20 18:09 Gabriella439

EDIT: Retracting my original comment as it was badly formulated and sounded like a complaint which I didn't intend.

What I meant to say was "It would be great to get this fully resolved to facilitate greater adoption." :smile:

Oct 26 '20 15:10 hanshoglund

@hanshoglund: I think my main question is: which places do you still need to preserve comments? The most recent release preserves comments on let bindings and record value/type fields

Oct 26 '20 15:10 Gabriella439

@Gabriel439 That is a great improvement, but to minimize negative first-user reactions I think it is still crucial to ensure that all comments accepted by the parser are also preserved by the formatter. Maybe the solution is to make the parser accept fewer comments?

I'm sure this is being worked on, just wanted to add another data point :D.

Oct 26 '20 16:10 hanshoglund-da

I can for sure say that it has also caused some weird reactions on our team and sth. we had to explicitly educate everyone about. To the point that we had long discussions on figuring out the best "tricks" like e.g. artificial let bindings for how we can use comments to document our dhall types. This was before the most recent improvements in 1.35+ and introduction of the dhall docs (which is still very rough).

Anyhow, I'm all for a "canonical formatter" and the benefits of this hugely outweigh the cons. In particular I also appreciate that the formatter enforces e.g. comment whitespace/alignment to some degree.

I like the idea proposed by @hanshoglund-da to think about how the "write comment & format file" experience can be improved to yield less surprising results.

Oct 26 '20 16:10 JohannesRudolph

dhall-haskell dhall-haskell copied to clipboard

dhall-format removes comments

dhall-haskell
dhall-haskell copied to clipboard