linkedom
linkedom copied to clipboard
Text escaping of $& becomes ><
Hello!
I think I have found a slight issue with how text is escaped. using this code:
import { parseHTML } from "https://esm.sh/linkedom"
const { document } = parseHTML('')
document.appendChild(document.createElement('script'))
document.querySelector("script").textContent = 'var $=1;$&&true'
console.log(document.toString())
logs <script>var $=1;><&true</script>
as output. $&
has been replaced by ><
.
If I run similar code in a browser I get <script>var $=1;$&&true</script>
. The code I ran in the browser is
var doc = document.createDocumentFragment()
doc.appendChild(document.createElement('script'))
doc.querySelector("script").textContent = 'var $=1;$&&true';
console.log(doc.querySelector("script").outerHTML)
This is a bit problematic for me since it means that inlineing my JS after SSR produces invalid code which will not run.
I have tested it in deno (v1.24.3), node (v18.2.0) and bun.sh (v0.1.11) so I don't think it's anything platform specific.
Thanks!
what's document.appendChild
supposed to do there? you have a body, a head, why document to append scripts? just to be sure it's not an issue caused yet another time by empty documents to parse
Sorry, that was just an example since In my SSR I first run the app and then use document.querySelector("script").textContent = myScript
to inline the script. This example shows the same bug and might be more proper:
import { parseHTML } from "https://esm.sh/linkedom"
const { document } = parseHTML('<script>var $=1;$&&true</script>')
console.log(document.toString())
Output: <script>var $=1;><&true</script>
those are two different code paths/logic ... one thing is textContent = '...'
which is not changed in code (for what I could tell) the other is parseHTML with stuff that the parser might get wrong so it's a 3rd party issue ... which one is true? both?
the toString actuallt escape chars, don't drop these https://github.com/WebReflection/linkedom/blob/main/esm/interface/text.js#L41
one thing is textContent = '...' which is not changed in code (for what I could tell)
I'm sorry, I don't quite understand what you mean. Do you mean that it's not reproducible with the code I gave or that it isn't a bug?
which one is true? both?
I don't know, but it seems to me like both code paths produce the same (seemingly wrong) output. If they are completely different code-paths then it would seem that they both have the same bug.
the toString actuallt escape chars, don't drop these
Sorry, I don't understand what you mean?
If it helps to clarify it seems to only happen with script elements, for example this does both textContent and html parsing for both script elements and divs, and it only seems to produce the bug with the script elements (and there does it for both):
import { parseHTML } from "https://esm.sh/linkedom"
const { document } = parseHTML('<script>$&</script><script class="t"></script><div>$&</div><div class="t2"></div>')
document.querySelector(".t").textContent = '$&'
document.querySelector(".t2").textContent = '$&'
console.log(document.toString())
output: <script>><</script><script class="t">><</script><div>$&</div><div class="t2">$&</div>