trafilatura icon indicating copy to clipboard operation
trafilatura copied to clipboard

memory: handling of `lru_cache`

Open adbar opened this issue 2 years ago • 2 comments

See discussions in https://github.com/adbar/htmldate/issues/56 and https://github.com/adbar/htmldate/issues/57.

adbar avatar Jul 01 '22 15:07 adbar

  • [x] check all functions using such a cache
    • [x] in trafilatura, htmldate and courlan
    • [x] in all the underlying libraries
  • [x] write a reset_caches() function which calls function_name.cache_clear() on all concerned functions

adbar avatar Jul 04 '22 15:07 adbar

After installing the latest version straight from the repository, the new function can be called as follows whenever one sees fit, for example every n URLs:

from trafilatura.meta import reset_caches

# at any given point
reset_caches()

adbar avatar Aug 01 '22 15:08 adbar