Results 3 issues of Ryan

cf this page http://ccbdrfc.tripod.com/mpgallery.html Related discussion http://sourceforge.net/p/nekohtml/bugs/123/ ``` java.lang.StackOverflowError at java.util.ArrayList.(ArrayList.java:177) at org.cyberneko.html.HTMLTagBalancer.consumeBufferedEndElements(HTMLTagBalancer.java:506) at org.cyberneko.html.HTMLTagBalancer.startElement(HTMLTagBalancer.java:589) at org.cyberneko.html.HTMLTagBalancer.forceStartElement(HTMLTagBalancer.java:760) at org.cyberneko.html.HTMLTagBalancer.startElement(HTMLTagBalancer.java:637) at org.cyberneko.html.HTMLTagBalancer.forceStartElement(HTMLTagBalancer.java:760) at org.cyberneko.html.HTMLTagBalancer.endElement(HTMLTagBalancer.java:1002) at org.cyberneko.html.HTMLTagBalancer.endElement(HTMLTagBalancer.java:1003) at org.cyberneko.html.HTMLTagBalancer.endElement(HTMLTagBalancer.java:1003) at org.cyberneko.html.HTMLTagBalancer.endElement(HTMLTagBalancer.java:1003) at...

When the number of distinct items is larger than the max itemset size this library gives incorrect results. I found this in two different ways: 1.) comparing against the brute-force...

``` (crawl2) rbox@Mac crawlers % pip install crawl4ai Collecting crawl4ai Using cached Crawl4AI-0.3.742-py3-none-any.whl.metadata (24 kB) Collecting aiosqlite~=0.20 (from crawl4ai) Using cached aiosqlite-0.20.0-py3-none-any.whl.metadata (4.3 kB) Collecting html2text~=2024.2 (from crawl4ai) Using cached...

🐞 Bug