xml2rfc Mangling of names

type_defect | by [email protected]

I have XML input that includes the character 'é' (an e with accent). This is rendered literally as "é" in both text and HTML outputs. Not the HTML escape sequence, mind; the HTML source is &#233;.

There is no good reason for this substitution, especially in the text version. It seems like a bug.

I understand that there might be some assumption that a person's name use <contact> or some such, but the silent rewriting is unwelcome. Allowing the character would be ideal. Generating an error when inputs contain characters that won't be properly retained would be a tolerable solution.

Issue migrated from trac:628 at 2022-02-08 07:14:45 +0000

Apr 27 '21 06:04 ietf-svn-bot

@[email protected] changed status from new to under_review

Jun 03 '21 21:06 ietf-svn-bot

I have the same problem here: https://www.ietf.org/archive/id/draft-ietf-cellar-flac-09.html#name-uncommon-bit-depth-2

The xml source says

<li>8-bit µ-law can be losslessly converted to 14 bit (Linear) PCM</li>

However, the HTML source says

<li class="compact" id="appendix-C.6-2.4">8-bit &amp;#181;-law can be losslessly converted to 14 bit (Linear) PCM<a href="[#appendix-C.6-2.4](view-source:https://www.ietf.org/archive/id/draft-ietf-cellar-flac-09.html#appendix-C.6-2.4)" class="pilcrow">¶</a>

The problem here is that the & should not be escaped when it is in fact the start of an escape already.

This seems to be related to #924, #832 and #767

Jul 05 '23 07:07 ktmf01

Yes.

Like many other element names, 'li' needs to be added to unicode_content_tags in unicode.py.

Workaround made possible by the grammar of a <li: Put a <t ... </t around the text.

Jul 05 '23 07:07 cabo

Thanks for the pointer!

Jul 05 '23 07:07 ktmf01

xml2rfc xml2rfc copied to clipboard

Mangling of names

xml2rfc
xml2rfc copied to clipboard