flashtext icon indicating copy to clipboard operation
flashtext copied to clipboard

Not working with Chinese.

Open joshhu opened this issue 2 years ago • 1 comments

Missing a lot of matches with only Chinese characters not words. Modifying line 523 in the keyword.py not working at all.

joshhu avatar Aug 20 '22 12:08 joshhu

Faced the same problem, kinda fixed it by adding my type of alphabet characters(in your case, chinese) to self._white_space_chars variable

self._keyword = '_keyword_'
self._white_space_chars = set(['.', '\t', '\n', '\a', ' ', ','])
vn_text = 'àáãạảăắằẳẵặâấầẩẫậèéẹẻẽêềếểễệđìíĩỉịòóõọỏôốồổỗộơớờởỡợùúũụủưứừửữựỳỵỷỹýÀÁÃẠẢĂẮẰẲẴẶÂẤẦẨẪẬÈÉẸẺẼÊỀẾỂỄỆĐÌÍĨỈỊÒÓÕỌỎÔỐỒỔỖỘƠỚỜỞỠỢÙÚŨỤỦƯỨỪỬỮỰỲỴỶỸÝ' # My Language alphabet characters
other_text = 'äöüßÄÖÜß' # German alphabet characters
try:
    # python 2.x
    self.non_word_boundaries = set(string.digits + string.letters + '_' + vn_text + other_text)
except AttributeError:
    # python 3.x
    self.non_word_boundaries = set(string.digits + string.ascii_letters + '_' + vn_text + other_text)

Hyprnx avatar Oct 11 '22 08:10 Hyprnx