node-htmlparser bug when parsing <script> tag using some template system

var htmlparser = require('htmlparser'),
    util = require('util'),
    handler = new htmlparser.DefaultHandler(function(err, dom){}),
    parser = new htmlparser.Parser(handler),
    rawHtml = '<script type="text/template"><h1>Heading1</h1></script>';

parser.parseComplete(rawHtml);
console.log(util.inspect(handler.dom, false, null));

This piece of code discards "<" of <h1> and outputs:

[ { raw: 'script type="text/template"',
    data: 'script type="text/template"',
    type: 'script',
    name: 'script',
    attribs: { type: 'text/template' },
    children: 
     [ { raw: 'h1>Heading1</h1>',  // discard <
         data: 'h1>Heading1</h1>',
         type: 'text' } ] } ]

Sep 01 '11 06:09 ghostoy

The funny thing is that, if you add a space between the script and the h1-tag, it actually works: https://github.com/FB55/node-htmlparser/blob/master/tests/23-template_script_tags.js

Oct 25 '11 20:10 fb55

Nothing funny about it, @FB55. The problem is deep inside parseTags(), where it consumes the first less-than symbol following any tag, including the script tag, but then correctly goes back into text-parsing mode to handle all of the template.

Nov 11 '11 22:11 elfsternberg

I fixed the bug inside my own fork, the test linked above passes without a problem (the additional space was removed).

Nov 12 '11 11:11 fb55