error converting UTF-8 accented char to djot
I get this error with the latest version of pandoc (3.1.12.2):
pandoc: Cannot decode byte '\xc3': Data.Text.Encoding: Invalid UTF-8 stream
when I try to convert a file to the djot format:
pandoc -f html -t djot test.html
where test.html is a UTF-8 encoded file with a content like this
<h2>test di convertibilità</h2>
What causes the error is the "à" character.
This was introduced in the latest patch of pandoc: version 3.1.12.1 works without errors (I'm testing the amd64 binary in Debian bookworm).
I can't really help unless you can upload a (minimal) file to test with.
Here it is (I had to zip it, because I can't upload HTML): test.zip
Here's a more minimal case:
% pandoc -t djot -f native
Header 2 ("",[],[]) [Str "\224"]
^D
pandoc: Cannot decode byte '\xc3': Data.Text.Encoding: Invalid UTF-8 stream
Looking at the changelog for jgm/djoths, I suspect this is due to https://github.com/jgm/djoths/issues/1
* Djot.Blocks: use ByteString directly in `toIdentifier` (#1,
Vaibhav Sagar).
EDIT: I see the issue https://github.com/jgm/djoths/pull/1/files#r1510547477