libpandoc icon indicating copy to clipboard operation
libpandoc copied to clipboard

How to deal with unicode characters?

Open itechbear opened this issue 9 years ago • 4 comments

It seems pandoc() doesn't support unicode characters. It just reports that codepoints of unicode chars are out of ascii char range [0,255)

itechbear avatar Apr 09 '16 09:04 itechbear

I should look into it. I see in 8ec4ac97 that the original author of libpandoc had changed the interface from wchar_t * to char *, adding a comment that all strings should be encoded as UTF-8. There's no explanation as to why!

ShabbyX avatar Jan 23 '17 03:01 ShabbyX

:+1: for this issue. It seems that whatever accentuated character I am feeding in libpandoc, I get the character with code 65533 as output.

Typically, input "é" gives "�".

Phyks avatar Jul 08 '17 11:07 Phyks

This is doable, if I simply duplicate everything and change all CStrings to CWStrings. But that would get quite ugly. I'll see if I can refactor some stuff to do this less dirty (having a newborn doesn't help finding time either!!)

ShabbyX avatar Jul 10 '17 18:07 ShabbyX

Ok I'll try to have a look at this hack around CWStrings. No problem, thanks for this lib!

Phyks avatar Jul 10 '17 21:07 Phyks