linkedom icon indicating copy to clipboard operation
linkedom copied to clipboard

Text escaping of $& becomes ><

Open SvanteRichter opened this issue 2 years ago • 4 comments

Hello!

I think I have found a slight issue with how text is escaped. using this code:

import { parseHTML } from "https://esm.sh/linkedom"
const { document } = parseHTML('')
document.appendChild(document.createElement('script'))
document.querySelector("script").textContent = 'var $=1;$&&true'
console.log(document.toString())

logs <script>var $=1;><&true</script> as output. $& has been replaced by ><.

If I run similar code in a browser I get <script>var $=1;$&&true</script>. The code I ran in the browser is

var doc = document.createDocumentFragment()
doc.appendChild(document.createElement('script'))
doc.querySelector("script").textContent = 'var $=1;$&&true';
console.log(doc.querySelector("script").outerHTML)

This is a bit problematic for me since it means that inlineing my JS after SSR produces invalid code which will not run.

I have tested it in deno (v1.24.3), node (v18.2.0) and bun.sh (v0.1.11) so I don't think it's anything platform specific.

Thanks!

SvanteRichter avatar Sep 09 '22 13:09 SvanteRichter

what's document.appendChild supposed to do there? you have a body, a head, why document to append scripts? just to be sure it's not an issue caused yet another time by empty documents to parse

WebReflection avatar Sep 09 '22 22:09 WebReflection

Sorry, that was just an example since In my SSR I first run the app and then use document.querySelector("script").textContent = myScript to inline the script. This example shows the same bug and might be more proper:

import { parseHTML } from "https://esm.sh/linkedom"
const { document } = parseHTML('<script>var $=1;$&&true</script>')
console.log(document.toString())

Output: <script>var $=1;><&true</script>

SvanteRichter avatar Sep 10 '22 13:09 SvanteRichter

those are two different code paths/logic ... one thing is textContent = '...' which is not changed in code (for what I could tell) the other is parseHTML with stuff that the parser might get wrong so it's a 3rd party issue ... which one is true? both?

the toString actuallt escape chars, don't drop these https://github.com/WebReflection/linkedom/blob/main/esm/interface/text.js#L41

WebReflection avatar Sep 10 '22 19:09 WebReflection

one thing is textContent = '...' which is not changed in code (for what I could tell)

I'm sorry, I don't quite understand what you mean. Do you mean that it's not reproducible with the code I gave or that it isn't a bug?

which one is true? both?

I don't know, but it seems to me like both code paths produce the same (seemingly wrong) output. If they are completely different code-paths then it would seem that they both have the same bug.

the toString actuallt escape chars, don't drop these

Sorry, I don't understand what you mean?


If it helps to clarify it seems to only happen with script elements, for example this does both textContent and html parsing for both script elements and divs, and it only seems to produce the bug with the script elements (and there does it for both):

import { parseHTML } from "https://esm.sh/linkedom"
const { document } = parseHTML('<script>$&</script><script class="t"></script><div>$&</div><div class="t2"></div>')
document.querySelector(".t").textContent = '$&'
document.querySelector(".t2").textContent = '$&'
console.log(document.toString())

output: <script>><</script><script class="t">><</script><div>$&amp;</div><div class="t2">$&amp;</div>

SvanteRichter avatar Sep 10 '22 23:09 SvanteRichter