jProcessing icon indicating copy to clipboard operation
jProcessing copied to clipboard

UnicodeDecodeError with classifier.baseline()

Open jcneshi opened this issue 7 years ago • 2 comments

This is a similar but different issue an another posted here.

$ python jnlp-test-sentencePolarityScore.py
Traceback (most recent call last):
  File "jnlp-test-sentencePolarityScore.py", line 9, in <module>
    print classifier.baseline(text)
  File "build/bdist.macosx-10.13-intel/egg/jNlp/jSentiments.py", line 56, in baseline
  File "build/bdist.macosx-10.13-intel/egg/jNlp/jSentiments.py", line 49, in polarScores_text
  File "build/bdist.macosx-10.13-intel/egg/jNlp/jTokenize.py", line 30, in jTokenize
  File "build/bdist.macosx-10.13-intel/egg/jNlp/jCabocha.py", line 27, in cabocha
UnicodeDecodeError: 'utf8' codec can't decode byte 0xb5 in position 105: invalid start byte

At first, I also had this same error with classifier.train(), but once I ran - ./configure --with-charset=utf8 for the mecab dictionary and for cabocha, the error disappeared.

However, with classifier.baseline() the error remains. Is there another part of the toolchain that I need to configure for utf-8? Am I missing something really basic?

Thanks!

jcneshi avatar Feb 16 '18 09:02 jcneshi

By the way, my jnlp-test-sentencePolarityScore.py file uses your code in section 1.4.2, seen here: http://jprocessing.readthedocs.io/en/latest/#how-to-use

jcneshi avatar Feb 16 '18 09:02 jcneshi

Hi, is this issue fixed?

kevincobain2000 avatar Aug 29 '18 07:08 kevincobain2000