futhark icon indicating copy to clipboard operation
futhark copied to clipboard

Construct an automatic code formatter

Open athas opened this issue 6 years ago • 15 comments

Tools such as gofmt are very useful for quickly cleaning reformatting code according to some coding style. I think we need the same for Futhark. It's a bit more complicated to do this for an expression-oriented language than a statement-oriented one, but there is prior art we can look at (like Brittany for Haskell).

While constructing futhark fmt is not exactly a small project, it is going to be fairly isolated from the rest of the compiler, and so does not require much prerequisite knowledge. I think the best approach is to construct the formatted program based on a token stream produced by the lexer. A first step will then be to add a lexer mode where comments are preserved as tokens.

athas avatar Jul 24 '19 15:07 athas

Depending on how we want to tackle this, I don't think the token stream is that useful. Other code formatters for expression-based languages such as elm-format and ormolu use user-introduced whitespace as hints to the formatter. Consider for instance the following piece of code:

aLongFunction = aFunction thatHas manyLong weirdArguments blaBla blAasdfasdf asdf asdfasdf

That line is longer than 80 characters, but perhaps the user wants to allow that in this specific case? Or perhaps the user doesn't mind lines longer than 80 characters. The above-mentioned formatters require users to manually enter whitespace in order to trigger a reformat to multiple lines in a case like this. For instance, by inserting a single newline in the above

aLongFunction = aFunction thatHas manyLong weirdArguments blaBla blAasdfasdf asdf
  asdfasdf

Ormolu will reformat the entire thing to

aLongFunction =
  aFunction
    thatHas
    manyLong
    weirdArguments
    blaBla
    blAasdfasdf
    asdf
    asdfasdf

elm-format allows the additional style of having the first argument on the same line as the function, as in the following:

aLongFunction =
  aFunction thatHas
    manyLong
    weirdArguments
    blaBla
    blAasdfasdf
    asdf
    asdfasdf

But if the user puts thatHas on a new line, elm-format will format it like ormolu does.

The same user-guided heuristics are used throughout the formatter, including for type signatures, record syntax, lists, and more.

I haven't really studied our tokenizer, but I assume that it doesn't provide the necessary whitespace information to decide stuff like this? The other question is, do we want similar behavior to ormolu/elm-format, or do we want something else?

NB. I think go fmt uses the same kind of user guided heuristics for to format code. Consider these two examples with equivalent code that formats differently.

Munksgaard avatar Sep 18 '20 11:09 Munksgaard

I would prefer a formatter that does not touch linebreaks.

The token stream is useful because each token is associated with a start and end position in the file. We can then extract the character sequences in between the tokens and consider them to be whitespace.

We could also extend the tokenizer to produce tokens corresponding to line comments. These would then be filtered away by the parser, but used by the formatter.

athas avatar Sep 18 '20 11:09 athas

I would prefer a formatter that does not touch linebreaks.

I am unsure what you mean by this.

The token stream is useful because each token is associated with a start and end position in the file. We can then extract the character sequences in between the tokens and consider them to be whitespace.

So each token has a line/column associated with it? In that case, yes, you are right.

We could also extend the tokenizer to produce tokens corresponding to line comments. These would then be filtered away by the parser, but used by the formatter.

Yes, that should be necessary, ie. in cases like this:

aLongFunction = aFunction
   thatHas manyLong weirdArguments blaBla blAasdfasdf asdf asdfasdf # a comment

futhark fmt should be able to put the comment somewhere (on the last line?).

Munksgaard avatar Sep 18 '20 11:09 Munksgaard

Philip Munksgaard [email protected] writes:

I would prefer a formatter that does not touch linebreaks.

I am unsure what you mean by this.

I would prefer a formatter that does not insert or remove any line breaks, but merely repositions the tokens within the existing lines. This is mostly because I have not seen any formatter for a functional language that is good at splitting expressions. Ormolu certainly isn't.

Although I would probably be fine with inserting linebreaks between top-level definitions, or moving 'in's to the next line, and such.

To a large degree, I am more interested in a light-weight auto-indenter (that also does small-scale fixups) than an Ormolu-style large-scale complete reformatter. In particular because writing local indentation rules in futhark-mode is a total PITA.

-- \ Troels /\ Henriksen

athas avatar Sep 18 '20 12:09 athas

Ah, I understand. Yes, that's definitely a simpler option and it obviates most of my concerns. Personally, I don't mind the many linebreaks in ormolu- or elm-formatted code, but I understand that not everyone feel the same way.

In particular because writing local indentation rules in futhark-mode is a total PITA.

That's true, although I must admit that in my personal workflow (even if it may not generalize to other peoples') in Haskell or Elm, I don't really use the indentation modes of the language in question. Instead I insert rudimentary whitespace where I deem necessary and let the formatter perform the necessary formatting and indentation. I therefore don't personally think it's such a big problem, but I admit that not everyone work the same way I do :-)

Munksgaard avatar Sep 18 '20 13:09 Munksgaard

Is anyone working on this still? If not then I might be up for attempting to write a formatter in Haskell.

joshniemela avatar Jan 26 '23 07:01 joshniemela

Is anyone working on this still? If not then I might be up for attempting to write a formatter in Haskell.

I don't think so. Please, go ahead!

Munksgaard avatar Jan 26 '23 09:01 Munksgaard

I have a student who might want to work on this for their bachelor's project in the upcoming semester, but don't let that hold you back.

athas avatar Jan 26 '23 09:01 athas

Wait, is it you?

athas avatar Jan 26 '23 09:01 athas

It is not, I'm probably going to do something futhark-related in my own bachelor project but that's two years from now. Feel free to ask him if he wants to collaborate with me or if he wants to write it entirely on his own though.

joshniemela avatar Jan 26 '23 09:01 joshniemela

Do we have a style guide specifically in regards to Futhark? The only one I can find is about Haskell.

joshniemela avatar Jan 26 '23 19:01 joshniemela

This is the closest we have, but it is very superficial and only talks about the style in the prelude.

athas avatar Jan 26 '23 20:01 athas

Alright, I'll have another look at this on monday but i'd reckon the first step is to acutally design a rigorous style guide for Futhark so it is possible to design a formatter around said guide

joshniemela avatar Jan 26 '23 21:01 joshniemela

I personally prefer formatters like black that ignore how code is formatted to begin with. That way, you don't have to think about prettiness at all while typing, and you code has a consistent style. I don't know if other people feel similarly, or if that's practical for a functional language like futhark.

wstevick avatar Apr 24 '23 15:04 wstevick

I suspect that is impractical. Functional languages tend to involve very deeply expressions, and I don't know of a tool that can fold those as intelligently as a human user. This is mainly based on my experiences with Haskell formatting tools.

athas avatar Apr 26 '23 12:04 athas