elyzer icon indicating copy to clipboard operation
elyzer copied to clipboard

Unicode support

Open tatsuya opened this issue 8 years ago • 0 comments

Great work! Also, it would be nice if this supports unicode text input (Example: café). See the error below:

➜  elyzer git:(master) python __main__.py --es "http://localhost:9200" --index my_index --analyzer my_analyzer "café"
TOKENIZER: kuromoji_tokenizer
Traceback (most recent call last):
  File "__main__.py", line 47, in <module>
    main()
  File "__main__.py", line 36, in main
    es=es))
  File "/Users/toiwa/Projects/Private/elyzer/elyzer/elyzer.py", line 72, in stepWise
    analyzeResp = es.indices.analyze(index=indexName, body=body)
  File "/Library/Python/2.7/site-packages/elasticsearch/client/utils.py", line 73, in _wrapped
    return func(*args, params=params, **kwargs)
  File "/Library/Python/2.7/site-packages/elasticsearch/client/indices.py", line 32, in analyze
    '_analyze'), params=params, body=body)
  File "/Library/Python/2.7/site-packages/elasticsearch/transport.py", line 284, in perform_request
    body = self.serializer.dumps(body)
  File "/Library/Python/2.7/site-packages/elasticsearch/serializer.py", line 50, in dumps
    raise SerializationError(data, e)
elasticsearch.exceptions.SerializationError: ({'text': 'caf\xc3\xa9', 'char_filter': [], 'tokenizer': u'kuromoji_tokenizer'}, UnicodeDecodeError('ascii', '"caf\xc3\xa9"', 4, 5, 'ordinal not in range(128)'))

tatsuya avatar May 11 '17 01:05 tatsuya