mlscraper
mlscraper copied to clipboard
Fuzzy text matching
Specifically for text matching something fuzzy would be great to reduce errors, e.g. checking for similarity of long texts to avoid whitespace-based errors, etc.
Options
- generic fuzzy matching for text
- passing samples that have StartOfText('In a country far far away') instead of the full string, so we can match nodes with the given text in the beginning
Also it needs to be considered when checking for correctness later as scraper.get(page) == expected_result
could turn out to be false.
#19 raised a case where it looks like a match with
instead of spaces is not found.