flashtext
flashtext copied to clipboard
Not working with Chinese.
Missing a lot of matches with only Chinese characters not words. Modifying line 523 in the keyword.py
not working at all.
Faced the same problem, kinda fixed it by adding my type of alphabet characters(in your case, chinese) to self._white_space_chars variable
self._keyword = '_keyword_'
self._white_space_chars = set(['.', '\t', '\n', '\a', ' ', ','])
vn_text = 'àáãạảăắằẳẵặâấầẩẫậèéẹẻẽêềếểễệđìíĩỉịòóõọỏôốồổỗộơớờởỡợùúũụủưứừửữựỳỵỷỹýÀÁÃẠẢĂẮẰẲẴẶÂẤẦẨẪẬÈÉẸẺẼÊỀẾỂỄỆĐÌÍĨỈỊÒÓÕỌỎÔỐỒỔỖỘƠỚỜỞỠỢÙÚŨỤỦƯỨỪỬỮỰỲỴỶỸÝ' # My Language alphabet characters
other_text = 'äöüßÄÖÜß' # German alphabet characters
try:
# python 2.x
self.non_word_boundaries = set(string.digits + string.letters + '_' + vn_text + other_text)
except AttributeError:
# python 3.x
self.non_word_boundaries = set(string.digits + string.ascii_letters + '_' + vn_text + other_text)