python-crfsuite
python-crfsuite copied to clipboard
UnicodeDecodeError at tag method
Currently I base my code on this tutorial and I have some problems with tag
method after the train section. I catch the UnicodeDecodeError
exception like this
try:
for xseq in X_test:
Y_pred.append(tagger.tag(xseq))
except UnicodeDecodeError as e:
print(e)
print(e.object)
The output looks like this
'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)
b'B-qu\xc3\xa9'
I tried to decode my X_test
before tag
using decode('utf-8')
but does seems not to works.
Just in case, I had some UnicodeEncodeError
problems at the trainer
object as shown below but seems that works using encode('utf-8')
for every substring. With this method I'm forcing manual encoding before append objects in trainer. This issue is mentioned at #96 and this solution works for me.
for xseq, yseq in zip(X_train, Y_train):
trainer.append(xseq, yseq)
NOTE: Sorry for my deficent english. I hope I've been clear enough. If not, please tell me!!! :)
Hello,
I have exactly the same issue, if I am able to train my model with bytes but when I use the tagger if the output is a bytes there is an internal error (same as above) which provide me to get the tag.
The only solution I have for the moment is to use crfsuite instead which is able to output non-ascii tags...