selectolax Memory leak

The used memory of my program keeps going up when parsing HTML. This was fixed months ago: https://github.com/rushter/selectolax/issues/90

Not sure why, it is happening again now, even with the same version that months ago was working fine.

If you run this code, you will see that memory only goes up and is never freed.

import psutil

import requests
from selectolax.lexbor import LexborHTMLParser

response = requests.get("https://github.com")

process = psutil.Process()

start = process.memory_info().rss

for i in range(20000):
    a = LexborHTMLParser(response.text*10).css("a")
    memory_usage = int((process.memory_info().rss - start) / 1024 ** 2)
    print(f"Memory usage: {memory_usage:,}MB")

Mar 01 '24 15:03 glowww

How much memory was consumed at max? Honestly, it does not look like a memory leak, more like the way Python preallocates memory. I got 500MB of consumed memory after 20k of iterations. You can remove the css() call and still get some memory spikes.

Mar 10 '24 13:03 rushter

@lexborisov To destroy the main parser we only need to call lxb_html_document_destroy right?

For CSS I do:

        lxb_selectors_destroy(self.selectors, True)
        lxb_css_memory_destroy(self.parser.memory, True)
        lxb_css_parser_destroy(self.parser, True)
        lxb_css_selectors_destroy(self.css_selectors, True)

But not sure if lxb_css_memory_destroy is really needed.

Mar 10 '24 13:03 rushter

@rushter

If you create the html parser separately, it should be destroyed separately.

lxb_html_parser_create()
lxb_html_parser_init()

document = lxb_html_parse();

lxb_html_parser_unref();

lxb_html_document_destroy()

or

lxb_html_document_create()
lxb_html_document_parse()
lxb_html_document_destroy()

Mar 10 '24 14:03 lexborisov

selectolax selectolax copied to clipboard

Memory leak

selectolax
selectolax copied to clipboard