Anton Bachin
Anton Bachin
> The output is: `Reachable words: 24033120`. That implies that the Lambda Soup DOM is about 190MB in size. Are there close to 24M live words at this point?
I also just realized that for single-character text nodes, the size blowup is about 80x relative to their in-file representation, due to all the overhead. The 13M file has a...
> `Markup.channel |> Markup.drain` Were you literally draining the character stream returned by `Markup.channel`? Or did you actually do ``` Markup.channel |> Markup.parse_html |> Markup.signals |> Markup.drain ``` i.e. actually...
It's definitely possible to reduce the memory consumption of the DOM by at least a constant factor, like 2x or 1.5x. If willing to do more extensive and invasive optimizations,...
> Also, that page seems nearly impossible to open in browser in practice It's good to hear. I've generally found that Markup.ml (and Lambda Soup) have the same behavior as...
As of now, I think: - Lambda Soup memory usage can easily be reduced by removing the `self` field. A slightly tricker change is inlining the text field, and using...
> * We could hash-cons the string values of text nodes and element names to reuse many blocks. (also known as string pooling)
Yes, it should be fairly straightforward. One would have to: 1. Extend the grammar of selectors with one more level: https://github.com/aantron/lambda-soup/blob/8084d5b86ce8f1223271fc1e67398ac618dacbda/src/soup.ml#L489 `simple_selector` is stuff like `.class-foo`, `[attribute-bar]`, combinators are `>`,...
There is no really good reason at this point. The original reason was that this library was (originally) written over Ocamlnet's `Nethtml` parser. As you can see, [its API](http://projects.camlcity.org/projects/dl/ocamlnet-4.1.2/doc/html-main/Nethtml.html) doesn't...
Is this something you would like to see added?