Zero (0) F-1 Score for Sequene Tagger with transformer embedding with BIO tags
Question
Hi, I have data in BIO format (not BIOES). I am training a sequence tagger model with transformer embedding but consistently get 0 f1-score for every epoch for XLM-ROBERTA-LARGE, but for other models (BERT-BASE-UNCASED) I'm getting a non-zero F-1 score. Could you please help me understand the reason? I can confirm that the loss was decreasing consistently. Code for XLM-ROBERTA-LARGE below:
# tag to predict
tag_type = 'ner'
# make tag dictionary from the corpus
label_dict = corpus.make_label_dictionary('ner', add_unk=False)
print(label_dict.get_items())
from flair.embeddings import TransformerWordEmbeddings
embeddings = TransformerWordEmbeddings(model='xlm-roberta-large',
layers="-1",
subtoken_pooling="first",
fine_tune=True,
use_context=False,
)
from flair.models import SequenceTagger
tagger = SequenceTagger(hidden_size=256,
embeddings=embeddings,
tag_dictionary=label_dict,
tag_type='ner',
tag_format='BIO',
use_crf=True,
use_rnn=False,
reproject_embeddings=False,
)
print(tagger)
from flair.trainers import ModelTrainer
trainer = ModelTrainer(tagger, corpus)
print(trainer)
trainer.train('resources/taggers/xlm-roberta-large',
learning_rate=0.005,
max_epochs=10,
mini_batch_size=16,
patience=2,
mini_batch_chunk_size=1, # remove this parameter to speed up computation if you have a big GPU,
embeddings_storage_mode='none',
checkpoint=True,
write_weights=True,
)
Training data snapshot:

Hello @iambankaratharva it could be that your learning rate is too high for XLM-RoBERTa-Large. This model is really large, so we typically use a much smaller learning rate around 5e-6.
Also, we recommend to use the fine_tune method as illustrated in the script here.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.