Annif Explain why a subject was matched

When Annif returns bad subjects, it can be difficult to understand why they were suggested. An explain parameter for the analyze functionality could be used to enable explanation functionality, which would return, for each suggested subject, the text from all of the blocks in the document that contributed to the subject assignment, sorted by their scores (highest score first). This would give at least some kind of idea which parts of the document caused the match.

Oct 05 '17 07:10 osma

LIME could be useful for this: https://github.com/marcotcr/lime/

May 19 '18 08:05 osma

In general, I would like an option both for 'suggest' and for 'eval' that returns the confidence scores for each descriptor and for each document, for evaluation purposes. Not sure if Annif already produces such an output anywhere?

Jul 29 '19 07:07 annakasprzik

@annakasprzik This is what the suggest command does - it will give you the confidence scores in the output. Like this:

$ echo "the cat sat on the mat" | annif suggest tfidf-en
<http://www.yso.fi/onto/yso/p26645>	place mats	0.5739196571753897
<http://www.yso.fi/onto/yso/p19378>	cat	0.412109991386263
<http://www.yso.fi/onto/yso/p864>	Felidae	0.4004559418090339
<http://www.yso.fi/onto/yso/p24992>	stray cats	0.31746311805949967
<http://www.yso.fi/onto/yso/p24619>	exotic (cat)	0.27605877849495275
<http://www.yso.fi/onto/yso/p24278>	Norwegian forest cat	0.2735824095480068
<http://www.yso.fi/onto/yso/p24186>	Siberian cat	0.2712520343571323
<http://www.yso.fi/onto/yso/p20058>	wildcat	0.2446630680506471
<http://www.yso.fi/onto/yso/p21172>	street musicians	0.23004085661703863
<http://www.yso.fi/onto/yso/p29087>	cat breeders	0.2211696167751634

The third column is the confidence score (between 0.0 and 1.0). Its interpretation varies a bit between the models.

For the eval command I don't think returning such scores makes sense, as the operation is on a higher level - you give it a bunch of manually indexed documents and it will compare the algorithm-suggested subjects with the manual ones, taking into account the predicted scores, and calculate overall similarity measures like F1 and NDCG.

Sep 03 '19 12:09 osma

BTW there's a great blog post on the ideas behind LIME, by the authors.

Sep 03 '19 12:09 osma