fparsec
fparsec copied to clipboard
Recommended way to parse surrogate pairs?
I'm working on a TOML parser, and I'm a bit at a loss for how to parse unicode characters that have surrogates in UTF-16/UCS-2 (I mention TOML because these codepoints are valid in it). I'm not deeply familiar with the CharStream
in FParsec, but at a first reading it doesn't seem to have any notion of surrogates, and deals entirely with sequences of individual characters of type char
.
Is there a way to parse surrogate pairs?