mlscraper icon indicating copy to clipboard operation
mlscraper copied to clipboard

Fuzzy text matching

Open lorey opened this issue 2 years ago • 1 comments

Specifically for text matching something fuzzy would be great to reduce errors, e.g. checking for similarity of long texts to avoid whitespace-based errors, etc.

Options

  • generic fuzzy matching for text
  • passing samples that have StartOfText('In a country far far away') instead of the full string, so we can match nodes with the given text in the beginning

Also it needs to be considered when checking for correctness later as scraper.get(page) == expected_result could turn out to be false.

lorey avatar Jun 21 '22 14:06 lorey

#19 raised a case where it looks like a match with   instead of spaces is not found.

lorey avatar Jul 07 '22 16:07 lorey