xeno icon indicating copy to clipboard operation
xeno copied to clipboard

Xeno.DOM: Heap exhausted on a 5.6M file

Open unhammer opened this issue 2 years ago • 6 comments

longlines.xml.zip ↑ through xeno-dom exhaust heap memory. I just put the file into the list in SpeedBigFiles.hs as [ benchFile ["xeno-dom"] "6MB" "longlines.xml.bz2" and got

benchmarking 6M/xeno-dom
xeno-speed-big-files-bench: Heap exhausted;
xeno-speed-big-files-bench: Current maximum heap size is 26843545600 bytes (25600 MB).

Strangely, only minor changes to the file (e.g. sed 's/x/xx/gincreasing the file size) will let it through with about 800M maxresident (as reported by /usr/bin/time). Inserting newlines after each > we also get 800M maxresident, but it doesn't seem to be related to the long lines, as almost any change to the file helps.

(Yes I should be using Xeno.SAX, but why does e.g. https://dumps.wikimedia.org/nowiki/20230520/nowiki-20230520-pages-articles-multistream-index.txt.bz2 at 11M go through fine with <400M maxresident and this one not? Even removing newlines, the wiki works fine. This feels like leakage.)

unhammer avatar May 22 '23 09:05 unhammer

@unhammer perhaps you could try this test with the latest master ? see #63

ocramz avatar Jun 20 '23 19:06 ocramz

The issue remains :(

unhammer avatar Jun 21 '23 06:06 unhammer

"fy fan". Ok this requires some deeper thinking.

ocramz avatar Jun 21 '23 07:06 ocramz

@unhammer anyway, it's at least reassuring that the latest patch doesn't change the memory behavior of the library (kudos @mitchellwrosen )

ocramz avatar Jun 21 '23 16:06 ocramz