brittany Support extension UnicodeSyntax

brittany discards Unicode symbols that GHC supports. For example, it will format something like

f :: A → B → C

as

f :: A -> B -> C

Mar 16 '18 02:03 ccrusius

I don't use that extension, but it would certainly be nice to have it supported in brittany.

Related additional features might also include some flags/commands to refactor/convert between unicode and non-unicode versions. (Current behaviour is like const False; you ask for id and we might want to allow the user to choose between those two and const True applied to "unicodeness".) I am not sure how much unicode syntax there is or how consistently its users apply it.

Mar 16 '18 17:03 lspitzner

I agree - it would be nice if the "Unicoding" could be automatically done by brittany.

As far to "how much," brittany already keeps Unicode characters for a bunch of things. My guess would be that it is currently screwing up the ones in the UnicodeSyntax extension list, probably because those are the only ones hardcoded in GHC itself (as opposed to being provided by other packages).

Mar 16 '18 22:03 ccrusius

So I just wrote a patch that makes brittany automatically transform UnicodeSyntax tokens into their Unicode equivalent whenever -XUnicodeSyntax is passed as a GHC option, which I think is reasonable. I'll think about a pull request - it requires some paperwork at work.

Mar 17 '18 02:03 ccrusius

Now for a little bit more detail, in case those who are more familiar with the code have better ideas: the problem is that brittany, when building function signatures, expressions, and such, writes down hardcoded strings for the following tokens: ::, ->, forall, <-, and =>. It does that in various places in the following files:

src/Language/Haskell/Brittany/Internal/Layouters/Decl.hs
src/Language/Haskell/Brittany/Internal/Layouters/Expr.hs
src/Language/Haskell/Brittany/Internal/Layouters/Pattern.hs
src/Language/Haskell/Brittany/Internal/Layouters/Stmt.hs
src/Language/Haskell/Brittany/Internal/Layouters/Type.hs

Changing the hardcoded strings to parameterized ones is not a big deal, it took me about an hour, and that was the first time I saw the code.

But a better idea would be to keep the proper tokens. I think the AST has the information there (for example, AnnDcolonU for Unicode, versus AnnDcolon for ASCII). That requires diving into the code to an extent I'll probably won't find the time to do. Hopefully it is not too hard for those who know the code inside-out!

Mar 17 '18 03:03 ccrusius

But a better idea would be to keep the proper tokens. I think the AST has the information there (for example, AnnDcolonU for Unicode, versus AnnDcolon for ASCII).

Right, and brittany already has the utils to query for the presence of those annotations for relevant nodes (via hasAnnKeyword). For example, see This test for the AnnSimpleQuote annotation thingy.

You need to be sure to run this test on the correct node, but by using --dump-ast-full this should not be too hard to get right.

Mar 20 '18 23:03 lspitzner

If someone would like to contribute a patch to make brittany keep the Unicode syntax (→, ∷, ←, etc.), where in the code would that someone best start?

Aug 29 '19 07:08 mb720

brittany brittany copied to clipboard

Support extension UnicodeSyntax

brittany
brittany copied to clipboard