flashtext icon indicating copy to clipboard operation
flashtext copied to clipboard

Lack of reference to fuzzy matching.

Open mrkkollo opened this issue 4 years ago • 3 comments

There has been a commit to add support for fuzzy matching using the "max_cost" argument in extract_keywords, however there seems to be no reference to it in the README and the documentation. Currently it feels like many people don't know such a feature is available.

mrkkollo avatar Jun 25 '20 12:06 mrkkollo

Its not good idea to use flashtext with max_cost argument. We have tested it and it is much slower than fuzzywhuzzy. For fuzzy matching, i would recommend to use fuzzywhuzzy

On Thu, 25 Jun 2020 at 14:47, Marko Kollo [email protected] wrote:

There has been a commit to add support for fuzzy matching using the "max_cost" argument in extract_keywords, however there seems to be no reference to it in the README and the documentation. Currently it feels like many people don't know such a feature is available.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/vi3k6i5/flashtext/issues/114, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACYSQNUMEC4XCXBGNYMZPK3RYNBN7ANCNFSM4OIKV7AA .

olgnaydn avatar Jun 25 '20 13:06 olgnaydn

Hi, I implemented the "fuzzyness" feature for flashtext Benchmarks are not included, and I agree it's lacking of documentation.

Amongst other things, there is a need to make it "smarter", and, perhaps, faster.

@olgnaydn do you have an example to provide that makes you argue that fuzzywhuzzy is more suitable when performance matters ? From what I know fuzzywhuzzy is not designed for multi-words matching, but I may be wrong

remiadon avatar Jul 29 '20 12:07 remiadon

hi where i can find max argument

shivampuri20 avatar May 19 '21 06:05 shivampuri20