flair icon indicating copy to clipboard operation
flair copied to clipboard

[Question]: Anomaly Detection / One Class Classification

Open quantarb opened this issue 6 months ago • 1 comments

Question

Can Flair be used to train a classifier with data from only one class to predict the likelihood that new text belongs to that class? I currently utilize a two class classifier that differentiates between my target documents and a random assortment of Wikipedia articles as the second class. However, this method seems wrong, as it requires generating an exhaustive list of counterexamples. I think modeling this as an anomaly detection problem be more appropriate?

quantarb avatar Feb 27 '24 19:02 quantarb

Hi @quantarb Flair doesn't have a Anomaly Detection model supported. I think the 2-class aproach is already a good solution, if you combine with with a sampling strategy:

  • train a classifier with all positive examples you have + a few negative that you have choosen by hand
  • predict the whole corpus or a subset that is large enough. Sort by the confidence of the model (highest conf for anomaly) and manually label the first N (I would take like 100) as anomaly/not-anomaly.
  • if the new labeled examples contain too many not-anomalies, start at step 1 again.

However if you don't find that sufficient, I suppose you will be happier with aproaches that are not supported here and might do more research.

helpmefindaname avatar Mar 01 '24 09:03 helpmefindaname