node-html-parser
node-html-parser copied to clipboard
parse & parseNoneClosedTags invalid behaviour
Hello, @taoqf! parseNoneClosedTags property doesn't work properly.
Wrong html fragment:
<div>
<ul>
<li>
<a href="https://example.com">1</a>
<span class="cat-count-span">(1)
</li>
<li><a href="https://example.com">2</a><span class="cat-count-span">(1)</li>
<li><a href="https://example.com">3</a><span class="cat-count-span">(1)</li>
<li><a href="https://example.com">4</a><span class="cat-count-span">(1)</li>
<li><a href="https://example.com">5</a><span class="cat-count-span">(1)</li>
<li><a href="https://example.com">6</a><span class="cat-count-span">(1)</li>
<li><a href="https://example.com">7</a><span class="cat-count-span">(1)</li>
<li><a href="https://example.com">8</a><span class="cat-count-span">(1)</li>
</ul>
</div>
Browser fixed output (from devtools):
...
<li>
<a href="https://example.com">1</a>
<span class="cat-count-span">(1)</span>
</li>
...
const output = parse(html, {comment: false, parseNoneClosedTags: true})
Library output:
<div>
<ul>
<li>
<a href="https://example.com">1</a>
<span class="cat-count-span">(1)
<li><a href="https://example.com">2</a><span class="cat-count-span">(1)
<li><a href="https://example.com">3</a><span class="cat-count-span">(1)
<li><a href="https://example.com">4</a><span class="cat-count-span">(1)
<li><a href="https://example.com">5</a><span class="cat-count-span">(1)
<li><a href="https://example.com">6</a><span class="cat-count-span">(1)
<li><a href="https://example.com">7</a><span class="cat-count-span">(1)
<li><a href="https://example.com">8</a><span class="cat-count-span">(1)
</span></li>
</span></li></span></li></span></li></span></li></span></li></span></li></span></li></ul>
</div>
On the other hand, if I parse large html with this "span issue" and use parse without parseNoneClosedTags property, I will get infinite loop inside library.
https://github.com/taoqf/node-html-parser/issues/152