marked
marked copied to clipboard
HTML renderer should distinguish between block and inline HTML
Describe the bug
<kbd>key</kbd>
doesn't seem to be generating correctly for us. See final rendered output at https://openuserjs.org/about/Falkon#debugging and source for this file here.
See also:
- Report from an Author at https://openuserjs.org/garage/Markdown_doesnt_render_kbd_element_correctly
To Reproduce Steps to reproduce the behavior:
- Backed out the dependency here to 0.8.0 and the issue disappears.
- Rechecked our sanitizer and even didn't sanitize html (code point here with plain return of string) since this is what I thought it might initially be. Still doesn't seem to be generating correctly even at release 1.1.1 with marked.
- Removed markdown rendering (code point here with plain return of string) and key restored working as expected in the browser.
Expected behavior A clear and concise description of what you expected to happen.
With anything after 0.8.0 the output gets rendered to html roughly as
<kbd></kbd>
key
... and more specifically at our document page source with html raw copy of inspection i.e. no line break for clarity:
<kbd></kbd>Ctrl + <kbd></kbd>Shift + <kbd></kbd>i
I realize the demo page doesn't appear to be doing this over there however this is one weird issue.
Should be (and used to be with 0.8.0 of marked):
<kbd>key</kbd>
See also:
- Our server side marked manipulation here with most set options floating around here.
- Our DOM manipulation settings at here.
Any help on understanding this change from 0.8.0 would be greatly appreciated. :) I also hope this issue is not replicated but I did do a brief search on this repo including code for kbd
and only found the smartypants
option which we've never used.
Tested in:
- Mozilla/5.0 (X11; Linux x86_64; rv:79.0) Gecko/20100101 Firefox/79.0 , Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36 and under Windows to rule out any browser issue.
It looks like it is the sanitization that is closing the open tag before the text then removing the close tag after the text.
Opening and closing inline html tags are tokenized separately so sanitize html is seeing <kbd>
and changing it to <kbd></kbd>
the seeing </kbd>
and removing it.
You can see the difference in tokenization in the v0.8.0 demo and the v1.1.1 demo
Appreciate the look... I'll retest it again... however...
Re:
It looks like it is the sanitization
But my test in dev is this from right above that code a few lines:
function sanitize(aHtml) {
//return sanitizeHtml(aHtml, htmlWhitelistPost);
return aHtml; // Test disabling sanitizer
}
... and it still comes back with:
<kbd></kbd>Ctrl + <kbd></kbd>Shift + <kbd></kbd>i
Here's the retest with no sanitization in dev. Same issue which seems to be highly pointing to marked:
If I do this here:
exports.renderMd = function (aText) {
//return marked(aText);
return aText;
};
... then all is well with the native html (md isn't processed so it's mostly md but the Ctrl + Shift + i is not having issues with sanitizing):
it looks like the line hookNode.innerHTML = sanitized;
also does the same thing
the line
hookNode.innerHTML = sanitized;
also does the same thing
That's just the final sanitized String results (which are bypassed in dev atm from item 2 above and retested/reshown in the first screen cap above) i.e. no change for this equivalent of documentFragment implementation in node.
I'm thinking more along the lines of the renderer
(Camel caps too) portion of the line of var sanitized = sanitize(marked.Renderer.prototype[aType].apply(renderer, arguments));
. If i'm bypassing any and all sanitizing by item number 2 up there with just calling the function and returning no manipulation it could either be some Renderer prototype change in this project or perhaps some worse issue in node.
When I get a few extra cycles I'll try it in latest node@14.x and see if it's a latest node@12.x issue specific to something this package utilizes. EDIT: Same issue in node@14.x.
As I mentioned above this is one weird issue. :\
Misc references for myself here:
- Here's the code compare changes between where the issue appears in this project.
- Notable
renderer
changes at lib/marked.js and lib/marked.esm.js.
That's just the final sanitized String results
yes but it does change the string. JSDom must be doing some sort of sanitization when setting innerHTML
hookNode.innerHTML = '<kbd>';
console.log(hookNode.innerHTML); // '<kbd></kbd>'
hookNode.innerHTML = '</kbd>';
console.log(hookNode.innerHTML); // ''
JSDom must be doing some sort of sanitization
Interesting guess and still appreciate the thoughts but...
It's just a filter for @mention
s. I took the jsdom dep out of the picture early this morning by commenting them out. Same marked issue. Then I redisabled sanitize-html dep and then it went to the output that is on marked demo page (dumped our whole document into marked's server).
It is highly improbable that two extensions/packages would cause this issue not to mention the fact that marked@0.8.0 works perfectly. It's more probable that there is something broken in marked. If given a choice between insecure HTML, losing @mention
s, and using the latest marked, we'll pick secure with sanitize-html along with jsdom and unfortunately have to migrate to another markdown processor or lower the version of marked... but I'd like to keep trying here as I like the project.
Those renderer changes are quite notable between 0.8.0 an 0.8.1 and perhaps it can't be utilized the way we are doing it with overriding the rendering or it's broken for extending the renderer
in marked or there is a node issue floating around. It's going to take me quite some time to figure it out and I'll have to reread the advanced and pro docs here to see if any changes. Sizzle wrote the most of the initial markdown library so I get to go figure out what he did, any changes since then, and verify it.
I'll try backing out the relevant changes in marked manually in dev to see if that helps (or breaks which is always possible).
NOTE: This will change a bit during debugging.
- Replacing 0.8.1's
lib/marked.js
with 0.8.0'slib/marked.js
yields thekbd
issue. - Checking marked's dependency changes between 0.8.0 and 0.8.1 ...
@@ -1,6 +1,6 @@
"version": "7.1.0",
"resolved": "https://registry.npmjs.org/acorn/-/acorn-7.1.0.tgz",
"integrity": "sha512-kL5CuoXA/dgxlBbVrflsflzQ3PAas7RYZB52NOm/6839iVYJgKMJ3cQJD+t2i5+qFa8h3MDpEOJiS64E8JLnSQ==",
"version": "7.1.1",
"resolved": "https://registry.npmjs.org/acorn/-/acorn-7.1.1.tgz",
"integrity": "sha512-add7dgA5ppRPxCFJoAGfMDi7PIBXq1RtGo7BhbLaxwrXPOmw8gq48Y9ozT01hUKy9byMjlR20EJhu5zlkErEkg==",
... acorn changed.
- Replacing 0.8.1's
lib/marked.esm.js
with 0.8.0'slib/marked.esm.js
yields thekbd
issue. - Replacing 0.8.1's
bin/marked
with 0.8.0'sbin/marked
yields thekbd
issue. - ... Continue replacing one file at a time in 0.8.1's with 0.8.0's ...
-
BINGO ....
marked/src/InlineLexer.js
fixes 0.8.1 ... resetting marked dependency and retesting... CONFIRMED
Now I get to go see what changed there.
- From #1602 :
--- ./InlineLexer.js 1985-10-26 02:15:00.000000000 -0600
+++ ~/tmp/marked.0.8.0/src/InlineLexer.js 1985-10-26 02:15:00.000000000 -0600
@@ -82,11 +82,11 @@
}
src = src.substring(cap[0].length);
- out += this.renderer.html(this.options.sanitize
- ? (this.options.sanitizer
+ out += this.options.sanitize
+ ? this.options.sanitizer
? this.options.sanitizer(cap[0])
- : escape(cap[0]))
- : cap[0]);
+ : escape(cap[0])
+ : cap[0];
continue;
}
- Misc note: Some WEIRD file date/times from npmjs.com installs and here. My clock is set by NIST.
@UziTech
So now to find this change source from #1602 in the new tree structure for marked@1.1.1. Any quick knowing where? ( #1627 ...) I'll still look in the current HEAD. Will also have to compare the bug at #1601 and understand it fully.
Thanks in advance. :)
It was actually the fix in #1602 that you pointed out that makes this happen.
Here is what is happening:
- In v0.8.0 inline HTML would not be sent to the renderer so your renderer would just gets called with
'<kbd>key</kbd>'
. - #1602 fixed that so in v0.8.1+ your render gets called with
'<kbd>'
(which your renderer turns into'<kbd></kbd>'
), then'</kbd>'
(which your renderer turns into''
), and finally'<kbd></kbd>key'
The issue is that your renderer is treating inline html tags as a block of html.
Maybe we could tell the html renderer whether it is inline or not and your render function can ignore inline html tokens?
Understand what you mean now. The granularity of the html
renderer method changed and increased. This was good for for #1601 but bad for the way we sanitize, mention, and whatever else we add in the future.
A couple of things that need clarification before I attempt to answer your question.
-
So in v0.8.0- it used to be solely a Block level rendering method and in v0.8.1 it became also an Inline level rendering method. Would this be a fair conclusion to make? If yes, are there other "cross-overs" like this? :question:
-
We tried back in the day to move sanitizing out of the renderer and it hosed syntax highlighting. There's not enough time and effort to cover 100% of their classes and other attributes they add in to successfully write exceptions to filter whether transformed or ignored.
-
We tried the
text
renderer but it was splitting up certain text elements into two, or more pieces depending on other md codes floating around. i.e. misses of hyper linking. Our users/authors have a sense of humor sometimes with showing us bugs in the rendering so mentions would be clipped under certain circumstances. -
We ended up with what we've been dealing with during this issue because it was usually distinct complete blocks of HTML code. e.g.
<kbd>key</kbd>
as we've been using in this issue. The sanitizer is a must and the mentioner needs well formed HTML code or XPath won't be able to create the object needed for parsing.
So basically we've tried just about everything that we know of in positioning the sanitizer and the mentioner in various places.
So to attempt to have an answer for your question of:
Maybe we could tell the html renderer whether it is inline or not and your render function can ignore inline html tokens?
Some sort of flags
argument, or equivalent, would definitely help to have an "end stage" value but my concern is still on my question Number 1 above in this comment. If Number 1 is a "yes, that's how it works logically"... Would it be better to have an inlinehtml
render method and restore the block level html
or use some sort of flag to indicate that html
is at a different stage of the process like you mentioned? :question: i.e. wouldn't want #1601 to recur depending on the answer here.
Another note loosely related to this: When using HTML <hr class="foo">
it never hits the renderers that we extend... I even tried the hr
renderer but I presume that is for ---
md codes. Without going too far off the topic, what did I miss? If I didn't miss anything... how does this relate to the granularity of html
renderer? :question:
- Yes the
text
renderer gets inline and block text. - v1.0.0 introduced the
tokenizer
that might be a better place to do the sanitization before the tokens are created. Or you could run a function on every token to do the sanitization with thewalkTokens
option. - That could be a bug but without the offending markdown I can't be sure.
- see 2. for better places for the sanitizer and mentioner.
Would it be better to have an
inlinehtml
render method and restore the block levelhtml
Maybe but that would be a breaking change and we would like to reduce the number of breaking changes due to the number of dependents. If there is a non-breaking way it would be better to do that now and implement the breaking change later.
When using HTML
<hr class="foo">
it never hits the renderers that we extend
It does when I try it. It calls the html
renderer.
It does when I try it.
Hmmm... dev which is node@14.x atm isn't sanitizing it. On production it's sanitizing it (temporarily put it up something similar at the same section of "Debugging" under Falkon) which is node@12.x. Will look into this later. Could be another weird artifact of the breaking change from 0.8.0 to 0.8.1.
Ref:
Thanks for the newer feature recommendations and clarifications on which ones are dual. Will look into tokenizer as soon as I can.
If there is a non-breaking way it would be better to do that now and implement the breaking change later.
Understood. That's why I asked if there were any other renderers that have the same possibilities as html... you said yes so your idea is probably the best esp. if they grow. :)