flair
flair copied to clipboard
[Question]: Anomaly Detection / One Class Classification
Question
Can Flair be used to train a classifier with data from only one class to predict the likelihood that new text belongs to that class? I currently utilize a two class classifier that differentiates between my target documents and a random assortment of Wikipedia articles as the second class. However, this method seems wrong, as it requires generating an exhaustive list of counterexamples. I think modeling this as an anomaly detection problem be more appropriate?
Hi @quantarb Flair doesn't have a Anomaly Detection model supported. I think the 2-class aproach is already a good solution, if you combine with with a sampling strategy:
- train a classifier with all positive examples you have + a few negative that you have choosen by hand
- predict the whole corpus or a subset that is large enough. Sort by the confidence of the model (highest conf for anomaly) and manually label the first N (I would take like 100) as anomaly/not-anomaly.
- if the new labeled examples contain too many not-anomalies, start at step 1 again.
However if you don't find that sufficient, I suppose you will be happier with aproaches that are not supported here and might do more research.