commonmark-spec icon indicating copy to clipboard operation
commonmark-spec copied to clipboard

Code points, scalar values, and validity

Open dpk opened this issue 1 year ago • 2 comments

  • A character is defined as a ‘Unicode code point’. This means (unpaired) surrogates are allowed in input and, by implication, in output. If this is not intended (which is what I glean from the answer to #614) the definition should be changed to ‘Unicode scalar value’. Changing ‘invalid Unicode code points’ to ‘invalid Unicode scalar values’ would also resolve #614.

  • It is not explicitly stated that every possible sequence of Unicode scalar values (or code points?) is a valid CommonMark input text for which some HTML output must be produced, although I also believe that this is the intention. If so, it should be made explicit that a processor which fails to parse any input document is non-conforming.

dpk avatar Sep 29 '24 12:09 dpk

See also https://github.com/commonmark/commonmark-spec/issues/369

dbuenzli avatar Sep 29 '24 13:09 dbuenzli

https://github.com/commonmark/commonmark-spec/pull/795

notriddle avatar May 01 '25 18:05 notriddle