flair Zero (0) F-1 Score for Sequene Tagger with transformer embedding with BIO tags

Question

Hi, I have data in BIO format (not BIOES). I am training a sequence tagger model with transformer embedding but consistently get 0 f1-score for every epoch for XLM-ROBERTA-LARGE, but for other models (BERT-BASE-UNCASED) I'm getting a non-zero F-1 score. Could you please help me understand the reason? I can confirm that the loss was decreasing consistently. Code for XLM-ROBERTA-LARGE below:

# tag to predict
tag_type = 'ner'
# make tag dictionary from the corpus
label_dict = corpus.make_label_dictionary('ner', add_unk=False)
print(label_dict.get_items())

from flair.embeddings import TransformerWordEmbeddings

embeddings = TransformerWordEmbeddings(model='xlm-roberta-large',
                                       layers="-1",
                                       subtoken_pooling="first",
                                       fine_tune=True,
                                       use_context=False,
                                       )

from flair.models import SequenceTagger

tagger = SequenceTagger(hidden_size=256,
                        embeddings=embeddings,
                        tag_dictionary=label_dict,
                        tag_type='ner',
                        tag_format='BIO',
                        use_crf=True,
                        use_rnn=False,
                        reproject_embeddings=False,
                        )
print(tagger)

from flair.trainers import ModelTrainer
trainer = ModelTrainer(tagger, corpus)
print(trainer)

trainer.train('resources/taggers/xlm-roberta-large',
                  learning_rate=0.005,
                  max_epochs=10,
                  mini_batch_size=16,
                  patience=2,
              mini_batch_chunk_size=1,  # remove this parameter to speed up computation if you have a big GPU,
                  embeddings_storage_mode='none',
                  checkpoint=True,
                  write_weights=True,
                  )

Training data snapshot: Screenshot 2023-04-13 at 21 14 05

Apr 14 '23 01:04 iambankaratharva

Hello @iambankaratharva it could be that your learning rate is too high for XLM-RoBERTa-Large. This model is really large, so we typically use a much smaller learning rate around 5e-6.

Also, we recommend to use the fine_tune method as illustrated in the script here.

Apr 14 '23 08:04 alanakbik

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Aug 12 '23 19:08 stale[bot]