node-html-parser
node-html-parser copied to clipboard
Bug when parsing <![CDATA[]]> tag which contains <> (angle brackets)
How to reproduce the issue:
import { parse } from "node-html-parser";
console.log(
parse(
`<ac:structured-macro
ac:name="code"
ac:schema-version="1"
ac:macro-id="some id">
<ac:parameter ac:name="language">bash</ac:parameter>
<ac:plain-text-body>
<![CDATA[
export AWS_ACCESS_KEY_ID=<your Access key ID> export AWS_SECRET_ACCESS_KEY=<your Secret access key>
]]>
</ac:plain-text-body>
</ac:structured-macro>
<p><br/></p>`
).toString()
);
Output of such program is:
<ac:structured-macro ac:name="code"
ac:schema-version="1"
ac:macro-id="some id">
<ac:parameter ac:name="language">bash</ac:parameter>
<![CDATA[
export AWS_ACCESS_KEY_ID=<your Access key ID> export AWS_SECRET_ACCESS_KEY=</your>
]]>
<p><br></p></ac:structured-macro>
There is problem it have crippled both content of CDATA (</your>) but as well it get confused and crippled rest of the html. It have completely swallowed tag ac:plain-text-body plus it crippled ending tag </ac:structured-macro> which should end immediately after ac:plain-text-body, but was moved to the end of html.
If I remove angle brackets <> from the content of CDATA tag html is parsed and printed correctly.
Expected results:
Is it will not try to anyhow interpret angle brackets inside tag and will parse HTML correctly.
Note:
This is just small part of large html page which get's whole crippled because of this bug.