Russian text Unicode error
Great work, but I encountered an error when extracting entities from mixed Russian and English text. The core error line is as follows:
File "D:\project\07-multilan\langextract_example.py", line 43, in
f.write(json.dumps(doc_dict, ensure_ascii=False) + '\n')
UnicodeEncodeError: 'gbk' codec can't encode character '\u0301' in position 12984: illegal multibyte sequence
Could you give a small example of the text that causes this error? Ideally one line, a few words. This will make it much easier to reproduce on dev machines and fix