commonmark-spec Code points, scalar values, and validity

A character is defined as a ‘Unicode code point’. This means (unpaired) surrogates are allowed in input and, by implication, in output. If this is not intended (which is what I glean from the answer to #614) the definition should be changed to ‘Unicode scalar value’. Changing ‘invalid Unicode code points’ to ‘invalid Unicode scalar values’ would also resolve #614.
It is not explicitly stated that every possible sequence of Unicode scalar values (or code points?) is a valid CommonMark input text for which some HTML output must be produced, although I also believe that this is the intention. If so, it should be made explicit that a processor which fails to parse any input document is non-conforming.

Sep 29 '24 12:09 dpk

See also https://github.com/commonmark/commonmark-spec/issues/369

Sep 29 '24 13:09 dbuenzli

https://github.com/commonmark/commonmark-spec/pull/795

May 01 '25 18:05 notriddle