gateplugin-LearningFramework icon indicating copy to clipboard operation
gateplugin-LearningFramework copied to clipboard

Properly implement classification confidence scores

Open johann-petrak opened this issue 6 years ago • 3 comments

This is a bit messy at the moment: make sure we always assign the correct confidence scores to a classification (and if possible, all class labels) if the algorithm returns them, and that we have a consistent way to do things if an algorithm does not return them (or does not return the full list for all classes). This should also be done right for algorithms using the dense corpus representation where the LF does not know any indices for class labels, and therefore we cannot use an array of class confidence scores.

Currently, there is also some discrepancy between classification and chunking as to how null or Double.NaN is handled as the value of a classification. Make sure we do this right in the chunking code (ModelApplication.addSurroundingAnnotation, but this should really get moved into the SeqEncoderDecoder classes)

johann-petrak avatar May 30 '18 16:05 johann-petrak

Also for both classification and chunking, the confidenceThreshold parameter should be optional with a default of null/not specified, in which case no checking of the confidence is performed at all.

johann-petrak avatar May 30 '18 16:05 johann-petrak

Allowing to leave confidenceThreshold parameter empty for not checking is implemented for classification application now.

johann-petrak avatar May 30 '18 16:05 johann-petrak

Allowing to leave confidenceThreshold parameter empty for not checking is implemented for chunking application as well now.

johann-petrak avatar May 30 '18 17:05 johann-petrak