U+0000 not converted to U+FFFD in all cases
The CommonMark Spec says:
For security reasons, the Unicode character U+0000 must be replaced with the REPLACEMENT CHARACTER (U+FFFD).
However pulldown-cmark doesn't preform this conversion in some cases. Take the following example (with "NUL" replaced with actual null bytes):
# NUL
NUL
�
[NUL](NUL)
`NUL`
```
NUL
```
dingus produces the following:
<h1>�</h1>
<p>�</p>
<p>�</p>
<p><a href="%EF%BF%BD">�</a></p>
<p><code>�</code></p>
<pre><code>�
</code></pre>
pulldown-cmark produces (with "NUL" replaced with actual null bytes):
<h1>�</h1>
<p>�</p>
<p>�</p>
<p><a href="%00">�</a></p>
<p><code>NUL</code></p>
<pre><code>NUL
</code></pre>
The link destination is different and the null byte isn't converted in the code span or block.
Hmmm.. the master branch now does replace the nulls inside the code blocks, but doesn't recognize the links any more. Double checking the spec to see if this is intended behavior..
I think this remains a TODO. I've personally prioritized it pretty low because nulls in strings are handled smoothly by just about everything in the Rust ecosystem, but in the C things are dicier.
It looks like we break here: https://github.com/raphlinus/pulldown-cmark/blob/master/src/parse.rs#L1892
Do you remember why these first ASCII bytes are treated specially? Can't find any reference to them in the spec.
Yes, the spec for link-destination says "a nonempty sequence of characters that does not include ASCII space or control characters", and the code is supposed to reflect that. That's not saying I got this completely right.
Right, thanks. I think we may actually be compliant in that case?
My reading is that this is a spec ambiguity. Should the NUL->U+FFFD replacement happen before or after parsing of link declarations? The spec doesn't say. If it's after, then not recognizing the links is correct. But before is what the dingus appears to do.
This is what I would call an extreme edge case. But I think we should file a number of spec clarification issues against CommonMark, and this is definitely one of them.