flair icon indicating copy to clipboard operation
flair copied to clipboard

Add OneClassClassifier model

Open jeffpicard opened this issue 1 year ago • 6 comments

This PR adds OneClassClassifier to flair.models for https://github.com/flairNLP/flair/issues/3496.

The task, usage, and architecture are described in the class docstring.

The architecture is inspired by papers such as Anomaly Detection Using Autoencoders with Nonlinear Dimensionality Reduction. While this doesn't achieve state of the art, or implement improvements like adding noise, I thought I'd see if you're interested in it, as it's a new task formulation that works, and might be useful to others.

The interface requires users to set the threshold explicitly... not sure if there's a cleaner way to hook that in to happen automatically after training completes.

Here's a short script demonstrating its usage separating IMDB from STACKOVERFLOW:

import json

from torch.utils.data import Subset

from flair.data import Sentence, Corpus
from flair.datasets import IMDB, STACKOVERFLOW
from flair.embeddings import TransformerWordEmbeddings
from flair.models.one_class_classification_model import OneClassClassifier
from flair.trainers import ModelTrainer

label_type = "sentiment"
embeddings = TransformerWordEmbeddings(
    model="xlm-roberta-base",
    is_document_embedding=True,
)

# Train on just IMDB, infer IMDB vs STACKOVERFLOW
corpus = Corpus(
    train=[x for x in Subset(IMDB().train, range(500))],
    test=[x for x in Subset(IMDB().test, range(250))] + [Sentence(x.text).add_label(typename=label_type, value="<unk>") for x in Subset(STACKOVERFLOW().test, range(250))]
)

label_dictionary = corpus.make_label_dictionary(label_type)
model = OneClassClassifier(embeddings, label_dictionary, label_type=label_type)

trainer = ModelTrainer(model, corpus)
trainer.fine_tune("./tmp")

threshold = model.calculate_threshold(corpus.dev)
model.threshold = threshold
result = model.evaluate(corpus.test, gold_label_type=label_type)
print(json.dumps(result.classification_report, indent=2))

prints

{
  "POSITIVE": {
    "precision": 1.0,
    "recall": 1.0,
    "f1-score": 1.0,
    "support": 250.0
  },
  "<unk>": {
    "precision": 1.0,
    "recall": 1.0,
    "f1-score": 1.0,
    "support": 250.0
  },
  "accuracy": 1.0,
[...]
}

Thanks for any time you're willing to put into considering this :) !

jeffpicard avatar Jul 15 '24 15:07 jeffpicard

@jeffpicard thanks for the PR!

@elenamer can you take a look?

alanakbik avatar Jul 19 '24 11:07 alanakbik

Many thanks for the review! I've squashed in a commit with your requested changes (Implement mini_batch_size and verbose; Rename loss). @elenamer would you be willing to take another look please?

jeffpicard avatar Aug 06 '24 07:08 jeffpicard

(CI was failing with errors that looked unrelated to this branch so I clicked the "rebase" button in the UI)

jeffpicard avatar Aug 07 '24 18:08 jeffpicard

Hi @alanakbik. I'm extremely sorry I took so long to reply, and many thanks for your thoughts.

To your points,

  1. Thanks for calling out the eval metrics confusion. I think this could be fixed by adding a plugin that calculates thresholds inside after_training_epoch. It would be similar to what's being done in this PR. Requiring a user to remember to use a plugin isn't ideal. Outside the scope here, but the plugin awkwardness might go away if flair had the ability for the model to handle trainer events similar to a plugin, but without a plugin.
  2. Hmm, those are great properties that emerge if this is modeled as a regression. My worry is that in many anomaly detection datasets, it would be hard to to come up with a continuous label rather than a discrete one to serve as the target. Printing the top 10 can still be achieved in a classifier with return_probabilities_for_all_classes=True and sorting by the probability perhaps. I haven't personally connected how a regression would work end-to-end in my head, but if you have and can help, that sounds great.
  3. What do you think about AnomalyDetection rather than OutlierDetection? Probably doesn't matter, but the way these words get used sometimes have more specific meaning. Outlier Detector meaning the training set has both inliers and outliers. Novelty Detection meaning the training set only has inliers. Anomaly Detection meaning either Outlier or Novelty detection. (e.g. sklearn's docs). The algorithm here only really applies to Novelty Detection, but maybe the future of this class involves more parameters specifying which algo to use.
  4. Thanks, I also thought throwing an exception for multi-class corpora is unexpected and not ideal generally. I think multi-class could be handled by adding another dimension to the tensors containing the extra encoder/decoder networks. I'll give that a try. It might also be possible to do "<unk>" vs any-class-in-training with a single encoder/decoder network.

Two more points I'm wondering what you (or others) think about:

  • flair/nn/decoder.py: Moving the embedding -> score logic into a decoder similar to PrototypicalDecoder rather than this class. This would allow the autoencoder technique to be reused in other classes (e.g. Regressors, TextPairClassifier, TextTripleClassifier), or swapped out in this class.
  • Anomaly Detection inside DefaultClassifier rather than this separate class. I think Anomaly Detection can be viewed as basically DefaultClassifier, except able to return "<unk>". DefaultClassifer could get a parameter, novelty: bool and the implementation would change to be something like:
    # in predict()
    if self.multi_label:
        sigmoided = ...
    elif self.novelty:
        # add <"unk"> class
    else:
        softmax = ...
    

Altogether this could look like

anomaly_detector = TextClassifier(
    novelty=True
)
trainer.fine_tune(
    plugins=[ThresholdCalculationPlugin()],
)

I'm sorry this got so long! Focusing in on some yes/no questions that I think can be decided independently:

  • ThresholdPlugin
  • flair/nn/decoder.py
  • novelty=True option for DefaultClassifier
    • This is the only thing that modifies existing code

jeffpicard avatar Aug 23 '24 20:08 jeffpicard

Hello @jeffpicard also from my side sorry it took so long to reply! Thanks for the many ideas and input!

  • ThresholdPlugin -> yes, I think that's a good idea. Especially in combination with your "out-of-scope" idea. I think it would actually not be so difficult to give models the ability to contain default plugins. The ModelTrainer would simply need to check if the model contains any plugins and add those to the training by default. So, having a ThresholdPlugin now would also prepare (and motivate) a future step of having Plugins at models themselves.

  • decoder and support in DefaultClassifier -> yes, I think it would be great to have this ability for other types of tasks as well. For instance, @elenamer just added the NER_NOISEBENCH dataset which we are using for noise-robust learning research on the NER level (see our paper), so it would be interesting to see if this approach could be used for word-level predictions. I guess the parameter novelty=True would be hard to parse for users, so maybe something more descriptive such as use_as_anomaly_detector or so.

alanakbik avatar Dec 19 '24 13:12 alanakbik

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Apr 26 '25 04:04 stale[bot]