commonmark-java icon indicating copy to clipboard operation
commonmark-java copied to clipboard

Escaping HTML must be done wisely

Open Rtaglia opened this issue 2 years ago • 0 comments

Hi,

I dot not understand why the html entity such as € is converted into the unicode character € during the html rendering ?

The string &euro ; is already HTML ready but not the unicode character € in most Html encoding files, so please do not convert all Html entities.

Other point about the Html rendering, if the source text already contains correct and valid HTML tags from the W3C such <div />, do not convert them to &lt;div&gt;. Keep them intact in the rendering result. Respect the choice of the author to beautify the output with Html tags. BUT, if the source text contains invalid HTML tags (not belonging to the HTML standard tags) such as < or > or <MyTag />, ... please convert to &lt;MyTag&gt; These tags are not for the beauty of the Htm rendering, it is the content to show.

Good job you made, congratulations to all.

Steps to reproduce the problem (provide example Markdown if applicable):

# This is *Sparta* in &euro; 
<hr />
Class<?> t = null;
if( t == null && t!= null )
        return null;

Expected behavior:

<h1>This is <em>Sparta</em> in &euro;</h1>
<p><hr />
Class&lt;?&gt; t = null;
if( t == null &amp;&amp; t!= null )
        return null;</p>

Actual behavior:

<h1>This is <em>Sparta</em> in €</h1>
<p>&lt;hr /&gt;
Class&lt;?&gt; t = null;
if( t == null &amp;&amp; t!= null )
        return null;</p>

(Also see what the reference implementation does: https://spec.commonmark.org/dingus/)

Rtaglia avatar Oct 10 '23 12:10 Rtaglia