nh3
nh3 copied to clipboard
Memory leak
Hi,
I think I've found a memory leak.
This example reproduces it:
import requests
import nh3
html = requests.get("https://search.brave.com/").text
for _ in range(30_000):
nh3.clean(html)
If you run that along any tool like htop
you should see that the memory of the process grows continually and without any apparent bound.
I've tried to find the root cause. But I'm not really sure of my findings, and they seem pretty weird.
Bisecting nh3
with the above example gave me this:
# b5074b186b813313b258a7c97871bb2d9fc0eaa7 is the first bad commit
# commit b5074b186b813313b258a7c97871bb2d9fc0eaa7
# Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
# Date: Mon Apr 22 16:12:37 2024 +0800
# Bump pyo3 from 0.21.1 to 0.21.2 (#43)
# Bumps [pyo3](https://github.com/pyo3/pyo3) from 0.21.1 to 0.21.2.
# - [Release notes](https://github.com/pyo3/pyo3/releases)
# - [Changelog](https://github.com/PyO3/pyo3/blob/main/CHANGELOG.md)
# - [Commits](https://github.com/pyo3/pyo3/compare/v0.21.1...v0.21.2)
# ---
# updated-dependencies:
# - dependency-name: pyo3
# dependency-type: direct:production
# update-type: version-update:semver-patch
# ...
# Signed-off-by: dependabot[bot] <[email protected]>
# Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
# Cargo.lock | 20 ++++++++++----------
# Cargo.toml | 2 +-
Using memray
also pointed to pyo3
:
python3 -m memray run --native -f -o output2.bin nh.py
python3 -m memray flamegraph -f output2.bin
Thank you!