sublime-markdown-popups icon indicating copy to clipboard operation
sublime-markdown-popups copied to clipboard

Invalid html generated when parsing link

Open rchl opened this issue 2 years ago • 14 comments

Python 3.3>>> mdpopups.show_popup(view, 'and [`i32`](`i32`) is')
Parse Error: ">i32</span>"><code class="highlight"><span style="color: #d8dee9;">i32</span></code></a> is</p></div> code: Unexpected character

The generated HTML is:

and <a href="<span style="color: #d8dee9;">i32</span>"><code class="highlight"><span style="color: #d8dee9;">i32</span></code></a> is

Which is invalid since the content of the href attribute is not escaped.

The escaping would make the code valid but I believe that what should happen is nothing in the link should be transformed into HTML and instead it should be taken literally, and then also escaped if needed.

This is how Github renders it:


and i32 is

and <a href="%60i32%60"><code class="notranslate">i32</code></a> is

rchl avatar Jun 18 '22 18:06 rchl

Yeah, I'm actually aware of this, and I'll have a fix for the next version. There was some complaint at some point about the fact that when Sublime was processing callbacks in HTML links that escaped quotes weren't handled properly by Sublime. A change was made to not escape them, which is simply a bad idea that was rarely encountered. We just need to let them be escaped.

Anyways, this is an issue in mdpopups, not any of the underlying libraries.

facelessuser avatar Jun 18 '22 18:06 facelessuser

And the fixed behavior will also not convert markdown to html in links?

rchl avatar Jun 18 '22 19:06 rchl

And the fixed behavior will also not covert markdown to html in links?

I'm not sure I understand the question. We are talking about how proper escaping in HTML attributes right?

What do you mean by your question, can you clarify with an example?

facelessuser avatar Jun 18 '22 19:06 facelessuser

The fixed logic could result in either:

  • <a href="%60i32%60"> or
  • <a href="&lt;span style=&quot;color: #d8dee9;&quot;&gt;i32&lt;/span&gt;">

I believe that the former one is correct since markdown should not be processed in URIs.

rchl avatar Jun 18 '22 19:06 rchl

Can you give me a reproducible Markdown source? I see the expected outputs, but I'm still not sure what you are providing as source.

facelessuser avatar Jun 18 '22 19:06 facelessuser

Wait, are you trying to provide Markdown code syntax as a link? If so, you aren't getting that to convert properly. You need to provide a link or properly escaped content for reference.

facelessuser avatar Jun 18 '22 19:06 facelessuser

You'll see all sorts of different outputs with different Markdown parsers. Some might handle it more sane than others, but you should not be doing this: https://johnmacfarlane.net/babelmark2/?text=%5B%60i32%60%5D(%60i32%60)

facelessuser avatar Jun 18 '22 19:06 facelessuser

I honestly thought you were talking about something else.

facelessuser avatar Jun 18 '22 19:06 facelessuser

It's the same example I gave in the initial comment:

'and [`i32`](`i32`) is'

It contains backticks in the link's URI and it comes from the LSP server.

I've talked about it in the initial comment and also shown how Github's parser handles it.

rchl avatar Jun 18 '22 19:06 rchl

Just escaping the link will fix the main issue that causes parsing error so it might be enough to do that.

Since a link like the one provided here as an example will likely not do anything useful, it might as well contain escaped HTML but it feels like the ultimate fix would be to not process markdown in links, besides escaping.

rchl avatar Jun 18 '22 19:06 rchl

It's the same example I gave in the initial comment:

Yes, I was on my phone, but I see that now. Basically, code is handled before URLs, which is why this happens. That is just the sequencing for how Python Markdown does things.

I've talked about it in the initial comment and also shown how Github's parser handles it.

Yep, how GitHub handles this is irrelevant. Currently, we are using the Python Markdown parser.

I've mentioned at one time that I'm potentially willing to add support for a CommonMark parser as soon as Package Control supports Py38 dependencies, though it's starting to feel like they may never happen...

Anyways, if you format the content in an escaped manner, it'll probably go through fine. I don't plan on patching the underlying library to workaround odd cases that it was not designed to handle. One could argue this is a bug, but it would be an upstream bug with Python Markdown, but as I'm familiar with its architecture, I'm doubtful a fix will be coming.

facelessuser avatar Jun 18 '22 20:06 facelessuser

@rchl Hmm, there may be an issue with how we do things using the Sublime highlighter. Turning off the Sublime highlighter (which causes Pygments to be used instead) doesn't seem to exhibit this issue.

I wanted to do my due diligence before closing this issue, so there may actually be something for us to fix here. I'll try to dig deeper.

facelessuser avatar Jun 19 '22 03:06 facelessuser

Hmm, it just appears that when Pygments handles a plain text code, it simply returns the text and styles it with CSS, but with the Sublime highlighter, we actually add inline styles, which is why we get HTML content. Really, Markdown should not be used as a URL.

I'm not sure if there is anything we can reasonably do, but I'll take a look if there is something simple we can do without trying to monkey patch Markdown itself.

facelessuser avatar Jun 19 '22 03:06 facelessuser

Yeah, we can break pygments with [`123`](`#!py3 123`) which will then perform highlighting which causes spans to be written:

<p><a href="<span class="mi">123</span>
"><code class="highlight">123
</code></a></p>

facelessuser avatar Jun 19 '22 04:06 facelessuser