sklearn-crfsuite icon indicating copy to clipboard operation
sklearn-crfsuite copied to clipboard

Printing issue with crfsuite CRF model features for Unicode Text model

Open anandi1989 opened this issue 6 years ago • 0 comments

I have a CRF (Object Type: sklearn_crfsuite.estimator.CRF) model where features data is in utf8 format. The model is working fine in terms of prediction. Now I want to get the insight of the CRF model.

In order to do that whenever I tried to print crf.attributes_ , crf.state_features_ and crf.transition_features_ I am getting following errors:

Traceback (most recent call last):
  File "C:\Users\user123\eclipse-workspace\xxx_path\standalone scripts\crfModelAnalysis.py", line 20, in <module>
    print_transitions(Counter(crf.transition_features_).most_common(k))
  File "C:\Python27\lib\site-packages\sklearn_crfsuite\estimator.py", line 490, in transition_features_
    if self._info is None:
  File "C:\Python27\lib\site-packages\sklearn_crfsuite\estimator.py", line 499, in _info
    self._info_cached = self.tagger_.info()
  File "pycrfsuite\_pycrfsuite.pyx", line 704, in pycrfsuite._pycrfsuite.Tagger.info
  File "pycrfsuite\_pycrfsuite.pyx", line 706, in pycrfsuite._pycrfsuite.Tagger.info
  File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 27: invalid start byte

Basic Info: Model is saved in pickle format. Python Version : 2.7 sklearn-crfsuite==0.3.6

Any kind of help will be highly appreciated.

anandi1989 avatar Oct 29 '18 16:10 anandi1989