crfsuite icon indicating copy to clipboard operation
crfsuite copied to clipboard

Question: Is "n-best" tagging possible with CRFSuite?

Open wrznr opened this issue 7 years ago • 3 comments

The Wapiti CRF toolkit has a neat feature called N-best Viterbi output which returns the n-best label sequences for an input sequence. Is there a similar functionality in crfsuite?

Thanks for your hints!

wrznr avatar Jun 30 '17 11:06 wrznr

CRFSuite does not support n-best output. The decoder algorithm is Viterbi which appears to not too difficult to make it n-best (especially for short sequences).

Did you manage to get meaningful n-best outputs with Wapiti on your data? I looked at it a while ago and realized that on my data n-best outputs not always make sense (NER).

usptact avatar Jun 30 '17 17:06 usptact

How about looking at marginal probabilities for all possible labels in a given position (that functionality exists in the Python wrapper as pycrfsuite.Tagger.marginal() so I presume also in the CFRSuite itself) and picking the best n values?

ZmeiGorynych avatar Jun 12 '18 12:06 ZmeiGorynych

@ZmeiGorynych Unfortunately marginals is not enough to compute the n-best sequence taggings.

usptact avatar Jun 12 '18 23:06 usptact