fparsec icon indicating copy to clipboard operation
fparsec copied to clipboard

Recommended way to parse surrogate pairs?

Open aggieben opened this issue 4 years ago • 0 comments

I'm working on a TOML parser, and I'm a bit at a loss for how to parse unicode characters that have surrogates in UTF-16/UCS-2 (I mention TOML because these codepoints are valid in it). I'm not deeply familiar with the CharStream in FParsec, but at a first reading it doesn't seem to have any notion of surrogates, and deals entirely with sequences of individual characters of type char.

Is there a way to parse surrogate pairs?

aggieben avatar Feb 06 '21 21:02 aggieben