bug when parsing <script> tag using some template system
var htmlparser = require('htmlparser'),
util = require('util'),
handler = new htmlparser.DefaultHandler(function(err, dom){}),
parser = new htmlparser.Parser(handler),
rawHtml = '<script type="text/template"><h1>Heading1</h1></script>';
parser.parseComplete(rawHtml);
console.log(util.inspect(handler.dom, false, null));
This piece of code discards "<" of <h1> and outputs:
[ { raw: 'script type="text/template"',
data: 'script type="text/template"',
type: 'script',
name: 'script',
attribs: { type: 'text/template' },
children:
[ { raw: 'h1>Heading1</h1>', // discard <
data: 'h1>Heading1</h1>',
type: 'text' } ] } ]
The funny thing is that, if you add a space between the script and the h1-tag, it actually works: https://github.com/FB55/node-htmlparser/blob/master/tests/23-template_script_tags.js
Nothing funny about it, @FB55. The problem is deep inside parseTags(), where it consumes the first less-than symbol following any tag, including the script tag, but then correctly goes back into text-parsing mode to handle all of the template.
I fixed the bug inside my own fork, the test linked above passes without a problem (the additional space was removed).