trafilatura
trafilatura copied to clipboard
memory: handling of `lru_cache`
See discussions in https://github.com/adbar/htmldate/issues/56 and https://github.com/adbar/htmldate/issues/57.
- [x] check all functions using such a cache
- [x] in trafilatura, htmldate and courlan
- [x] in all the underlying libraries
- [x] write a
reset_caches()
function which callsfunction_name.cache_clear()
on all concerned functions
After installing the latest version straight from the repository, the new function can be called as follows whenever one sees fit, for example every n URLs:
from trafilatura.meta import reset_caches
# at any given point
reset_caches()