enlive
enlive copied to clipboard
StackOverflowError when parsing certain html
Using enlive when reading certain urls gives me a StackOverflowError, with these parts of the stacktrace repeated over and over:
clojure.core/mapcat core.clj: 2660
clojure.core/apply core.clj: 630
clojure.core/seq core.clj: 137
...
clojure.core/map/fn core.clj: 2622
net.cgrand.enlive-html/zip-select-nodes*/select1/fn enlive_html.clj: 512
net.cgrand.enlive-html/zip-select-nodes*/select1 enlive_html.clj: 512
...
clojure.core/mapcat core.clj: 2660
clojure.core/apply core.clj: 630
clojure.core/seq core.clj: 137
...
clojure.core/map/fn core.clj: 2622
net.cgrand.enlive-html/zip-select-nodes*/select1/fn enlive_html.clj: 512
net.cgrand.enlive-html/zip-select-nodes*/select1 enlive_html.clj: 512
Any way to avoid this? Are we just naively recurring somewhere? Can this be turned into a loop/recur
?
Thank you!
I'm getting this as well. Digging through logs now to find some example data...
Can you provide a failing gist please?
https://gist.github.com/retnuh/9747891f2d1fb74e787b
I've stripped down the clojure to more or less bare bones, but haven't had time to dig through the HTML file. I at first thought it might be the STYLE tag outside the HTML tag, but a stripped down version (i.e. most of the body removed) works okay.
bad2.html also triggers StackOverflowError, and it happens much more quickly.
Thanks Hunter.
but a stripped down version (i.e. most of the body removed) works okay.
The snippet seems alright, but the html file is too large for us to investigate it. It'd be greatly helpful if you could track down where exactly it blows up. Alternatively, try using JSoup as a parser as it is more robust than TagSoup.