trafilatura icon indicating copy to clipboard operation
trafilatura copied to clipboard

Thoroughly implement and test duplicate detection

Open adbar opened this issue 6 years ago • 1 comments

  • [x] Least-recently-used (LRU) cache
  • [x] Maximum number of occurrences allowed?
  • [ ] Line / sentence / paragraph / document level?
  • [ ] Concurrency: thread-safety / multiprocessing

adbar avatar Jan 09 '20 11:01 adbar

Useful test case: https://github.com/miso-belica/jusText/issues/42

adbar avatar Oct 21 '21 16:10 adbar