xml2rfc
xml2rfc copied to clipboard
Mangling of names
type_defect
| by [email protected]
I have XML input that includes the character 'é' (an e with accent). This is rendered literally as "é" in both text and HTML outputs. Not the HTML escape sequence, mind; the HTML source is é
.
There is no good reason for this substitution, especially in the text version. It seems like a bug.
I understand that there might be some assumption that a person's name use <contact>
or some such, but the silent rewriting is unwelcome. Allowing the character would be ideal. Generating an error when inputs contain characters that won't be properly retained would be a tolerable solution.
Issue migrated from trac:628 at 2022-02-08 07:14:45 +0000
@[email protected] changed status from new
to under_review
I have the same problem here: https://www.ietf.org/archive/id/draft-ietf-cellar-flac-09.html#name-uncommon-bit-depth-2
The xml source says
<li>8-bit µ-law can be losslessly converted to 14 bit (Linear) PCM</li>
However, the HTML source says
<li class="compact" id="appendix-C.6-2.4">8-bit &#181;-law can be losslessly converted to 14 bit (Linear) PCM<a href="[#appendix-C.6-2.4](view-source:https://www.ietf.org/archive/id/draft-ietf-cellar-flac-09.html#appendix-C.6-2.4)" class="pilcrow">¶</a>
The problem here is that the & should not be escaped when it is in fact the start of an escape already.
This seems to be related to #924, #832 and #767
Yes.
Like many other element names, 'li' needs to be added to unicode_content_tags
in unicode.py.
Workaround made possible by the grammar of a <li: Put a <t ... </t around the text.
Thanks for the pointer!