flashtext icon indicating copy to clipboard operation
flashtext copied to clipboard

rm non_word_boundaries

Open Yvette-Wang opened this issue 4 years ago • 1 comments

Yvette-Wang avatar Nov 25 '19 07:11 Yvette-Wang

I don't think this helps flashtexts intended use. Removing word boundaries entirely will only result in "words found in words" kind of results. As seen here:

from flashtext import KeywordProcessor
kp = KeywordProcessor()
kp.add_keywords_from_list(['cat', 'catch'])
kp.non_word_boundaries = "_"

text = 'Try to catch this.'
kp.extract_keywords(text)
>> ['cat']

Since flashtext stops at the first hit, it won't even find "catch".

iwpnd avatar Nov 25 '19 07:11 iwpnd