coralnet icon indicating copy to clipboard operation
coralnet copied to clipboard

Update classifier accuracy/CM stats at times besides successful training

Open StephenChan opened this issue 3 years ago • 0 comments

A source's classifier accuracy stats and confusion matrix data only refresh after training and accepting a new classifier. This can lead to the stats being outdated. This was mentioned in issue #159 (this comment) and also in issue #150 ('Miscellaneous' point in this comment), but it should probably get an issue of its own.

Examples of situations where this leads to the data being outdated in some way:

  1. More images have been annotated since the last classifier acceptance. It's possible to reject a few classifiers in a row due to lack of improvement, so the current annotated-image count may get much higher than the classifier stats' image count.

  2. The labelset has been changed, and the previous classifiers have been wiped, but no new classifier has been trained yet. In this case, the stats will still refer to the latest accepted classifier among the ones that were wiped. If a label was removed from the labelset, this can lead to the following server error (as seen today for example):

    DoesNotExist at /source/<id>/backend/
    LocalLabel matching query does not exist.
    
  3. There's also the case where annotations have been changed (instead of new ones being added) since the last classifier acceptance, especially if the source owner decides that a bunch of points were mislabeled as A and should be B instead. Though, in this situation we generally don't see retraining either, so it's not just an issue of updating the classifier stats.

Regarding implementation, I believe I recall that the accuracy stats and confusion matrix data are pulled from valresults files or something similar, which are currently only generated when a new classifier is trained. So we may have to rework how this data is stored and retrieved (related: issue #288).

It may be desirable to keep historical accuracy/CM (particularly accuracy) from the first time a classifier was trained, like for giving different options to graph classifier improvement over time. That would involve creating a new field on the Classifier model.

StephenChan avatar Dec 06 '21 23:12 StephenChan