haddock
haddock copied to clipboard
Character "©" in comment leads to exception: commitAndReleaseBuffer: invalid argument (invalid character)
The following haddock comment
...
-- | ...a ....
{-|
Prefix: xyz
Characteristics
* does this and this
* e.g. bla... (' \0')
* e.g. ... (' ')
* e.g. ... 'A'
* e.g. ... '©'
* regarding particular functions
* ...
* ...
* ...
* ...
* ...
-}
led to the following output and hence stop of Haddock (version 2.24.0)
Warning: '<stderr>: commitAndReleaseBuffer: invalid argument (invalid character)
After removing the line with '©' the error (designated as Warning) disapeared.
Now, I am using
Haddock version 2.25.1, (c) Simon Marlow 2006
and now it just stops - even without any message.
@JoergBrueggmann Could you tell me what locales your shell uses? :)
On my machine with UTF-8 locales, it is successfully converted to ©.
Just another data point. In en_US.UTF-8, I get
'©' is out of scope.
with Haddock 2.24 and 2.26 (presumably, because quotation marks are supposed to denote Haskell identifiers that Haddock will try to hyperlink).
@JoergBrueggmann Could you tell me what locales your shell uses? :)
On my machine with UTF-8 locales, it is successfully converted to
©.
I am not exactly sure what you mean by "what locales your shell". May be this answers your question: I am using VSCode which can deal with different encodings. The encoding of the file is UTF-8 with BOM. If this doesn't answer your question, please let me know. Thank you.
Just another data point. In en_US.UTF-8, I get
'©' is out of scope.
Well, the code point of "©" is U+00A9 (unicode) and in UTF-8 coded as 0xA9.
with Haddock 2.24 and 2.26 (presumably, because quotation marks are supposed to denote Haskell identifiers that Haddock will try to hyperlink).
Exactly, the single quotation marks denote Haskell identifiers. It seems that Haddock cannot deal with (all) Haskell identifiers that are encoded in UTF-8 and are above code point U+007F.
Please try enabling {-# LANGUAGE UnicodeSyntax #-} to handle Unicode identifiers. Is it any better if you save file without BOM?
(The error message commitAndReleaseBuffer: invalid argument (invalid character) is truly abhorrent. Any volunteers to make https://gitlab.haskell.org/ghc/ghc/-/blob/master/libraries/base/GHC/IO/Encoding/Failure.hs more helpful?)
Nice, to "see" you again.
Please try enabling
{-# LANGUAGE UnicodeSyntax #-}to handle Unicode identifiers. Is it any better if you save file without BOM?
Both, {-# LANGUAGE UnicodeSyntax #-} and saving the file without BOM (using Notepad++) doesn't work any better.
(The error message
commitAndReleaseBuffer: invalid argument (invalid character)is truly abhorrent. Any volunteers to make https://gitlab.haskell.org/ghc/ghc/-/blob/master/libraries/base/GHC/IO/Encoding/Failure.hs more helpful?)
What exactly are you looking for? I do not know the concept behind "...IO/Encoding/Failure.hs". Can you provide some links to get some more details? Background: I going to build a compiler-compiler in Haskell. Therefore, for file IO, I am currently creating a library do deal with different character encodings in a completely different way. May be we find some synergy.
I raised https://gitlab.haskell.org/ghc/ghc/-/issues/21389 to improve relevant error messages.
@Kleidukos @ulysses4ever I assume Haddock could have catched this exception to provide better user experience.
@JoergBrueggmann I'm not a maintainer here, but it could help if you share a standalone reproducer.
@JoergBrueggmann I'm not a maintainer here, but it could help if you share a standalone reproducer.
Do you mean a small Haskell project in e.g. in github to reproduce the bug?
Yes, a small package such that cabal haddock fails on it.
Yes, a small package such that
cabal haddockfails on it.
OK, I will do. I started to create such a package. Unfortunately, it behaves differently after reducing the original version to a smaller package. :-( The original version stops even without a message and the reduced one writes an error message. I will resume tomorrow.
@Bodigrim yup'. This goes in the TODO list. :)
@Kleidukos, @ulysses4ever, @Bodigrim, please find the standalone project to reproduce the bug in repository https://github.com/JoergBrueggmann/HaddockIssue1472
If you have question regarding the project, please let me know.
@JoergBrueggmann thanks for the standalone reproducer. Unfortunately, it builds and renders okay on my end. Assuming, you're on a Linux, could you copy and paste here the result of executing env | grep LANG in your terminal?
@ulysses4ever, I am working with stack on windows and hence Msys2. There is env but no grep. env prints the following:
...
LANG=en_US.UTF-8
...