flashtext
flashtext copied to clipboard
Extract Keywords from sentence or Replace keywords in sentences.
I'm currently processing a list of 100k+ texts. Regex is incredibly slow for this so I thought FlashText would be perfect. I'm unable to use FlashText to replace the '_'...
It would be good to have a count getter to obtain the number of keywords processed by. Using the processor to identify the presence of the keywords, but still in...
Hi @vi3k6i5, Thanks for the wonderful library, it's really help a lot to faster the data preprocessing iteration. I plan to use this library for my internal text library, however...
The target word suffix plus a number will cause the extraction to fail. >>> import flashtext >>> _extractor = flashtext.KeywordProcessor() >>> _extractor.add_keyword('地中海贫血') True >>> _extractor.extract_keywords('地中海贫血') ['地中海贫血'] >>> _extractor.extract_keywords('地中海贫血2') []
kp = KeywordProcessor() kp.add_keyword("ABC DE") kp.add_keyword("DE FGHI") kp.extract_keywords("ABC DE FGHI") >>>['ABC DE'] why not ['ABC DE', 'DE FGHI']
Hi ! Thanks for this project :) It can be cool and amazing if you support the same algorithm from IBM Watson conversation when we activated the Fuzzy Matching Option...
To extract mixed case-sensitive and case-insensitive keywords from text, is it possible to construct one keywordprocessor to handle both? Thanks,
Hello, I encountered an issue with `span_info=True` when used on a string with combined characters. As demonstration consider the following example: ```python import re from flashtext import KeywordProcessor from unicodedata...
keywordsList = ["java", "python"] keyword_processor.add_keywords_from_list(keywordsList ) if the length of keywordsList is Million level, keyword_processor.extract_keywords() will extracts nothing, how can it deal with Million level keywords list?