flashtext icon indicating copy to clipboard operation
flashtext copied to clipboard

“成都”the two chinese words won't recognize

Open GuoPL opened this issue 2 years ago • 2 comments

from flashtext import KeywordProcessor

#text = "@苍月轶 再次核实:骆然5月8日持24小时核酸从宜昌回蓉,到成都24小时内核酸一次,9号回泸定,24小时内又做一次核酸,均阴性,健康码绿码。宜昌不是 AB区域。" text = "成都到北京高铁3小时,郑州到成都2小时"

print(text) kp = KeywordProcessor() kp.add_keyword("到成都", ("成都", "ab")) kp.add_keyword("宜昌", ("宜昌", "ab"))

print(len(kp)) print(kp) word_index = kp.extract_keywords(text, span_info=True) print(word_index) for item in word_index: print(text[item[1]:item[2]])

print('finished')

GuoPL avatar Jun 08 '22 07:06 GuoPL

from flashtext import KeywordProcessor

text = "成都到北京高铁3小时,郑州到成都2小时" kp = KeywordProcessor() kp.add_keyword("到成都", ("成都", "ab")) kp.add_keyword("宜昌", ("宜昌", "ab"))

print(len(kp)) keywords_found = kp.extract_keywords(text, span_info=True) for item in keywords_found: print(item)

2 (('成都', 'ab'), 13, 15)

Reference:https://blog.csdn.net/chen10314/article/details/122048726

githublyff avatar Aug 07 '22 04:08 githublyff

still not a good solution cause so many special char will appear in our keywords. like () [] ... etc.

zhangbo2008 avatar Feb 08 '23 13:02 zhangbo2008