Xeno.DOM: Heap exhausted on a 5.6M file
longlines.xml.zip
↑ through xeno-dom exhaust heap memory. I just put the file into the list in SpeedBigFiles.hs as
[ benchFile ["xeno-dom"] "6MB" "longlines.xml.bz2"
and got
benchmarking 6M/xeno-dom
xeno-speed-big-files-bench: Heap exhausted;
xeno-speed-big-files-bench: Current maximum heap size is 26843545600 bytes (25600 MB).
Strangely, only minor changes to the file (e.g. sed 's/x/xx/g – increasing the file size) will let it through with about 800M maxresident (as reported by /usr/bin/time). Inserting newlines after each > we also get 800M maxresident, but it doesn't seem to be related to the long lines, as almost any change to the file helps.
(Yes I should be using Xeno.SAX, but why does e.g. https://dumps.wikimedia.org/nowiki/20230520/nowiki-20230520-pages-articles-multistream-index.txt.bz2 at 11M go through fine with <400M maxresident and this one not? Even removing newlines, the wiki works fine. This feels like leakage.)
@unhammer perhaps you could try this test with the latest master ? see #63
The issue remains :(
"fy fan". Ok this requires some deeper thinking.
@unhammer anyway, it's at least reassuring that the latest patch doesn't change the memory behavior of the library (kudos @mitchellwrosen )