Kevin Brubeck Unhammer

Results 390 comments of Kevin Brubeck Unhammer

This PR changes quite a few unrelated things though (style changes etc. in JSO.js). @tegner maybe better to submit one that only updates the deps, and do a separate PR...

> Neil Mitchell said that he just memory maps the file. Perhaps you > could try that and report back? > > https://hackage.haskell.org/package/mmap-0.5.9/docs/System-IO-MMap.html#v:mmapFileByteString Nice, I didn't know about that. In...

A quick-and-dirty speed comparison shows `process print (\_ _ -> return ()) c c c c where c = (\_ -> return ())` taking 12s on a 1G file, where...

SAX seems to only stream on the output side – but maybe it would help with avoiding space leaks? I notice the "count elements" examples from the README using `fold`...

> Interesting! I expect most of that speed hit comes from `print` (has > to convert the ByteString to String and then use the slow putStrLn > which prints each...

Hm, I haven't noticed high memory usage with xml-conduit here, just very high cpu usage. (EDIT: If I upgrade conduit to 1.3.1, I do see https://github.com/snoyberg/conduit/issues/370 – but we've been...

Even with mmap-ing, I can't get this to have constant memory usage. ``` -rw-rw-r-- 1 unhammer unhammer 15M okt. 18 11:01 big.xml -rw-rw-r-- 1 unhammer unhammer 337 okt. 18 11:03...

would using on-the-fly pdf rendering instead of cached pdftohtml give a smaller ~/.config/bookworm db too? Mine is already up to 1.9GB

in my case, 297 pdf's and 34 html/txt/epub