All predictions are <unk>
I'm running the code from https://medium.com/thecyphy/training-custom-ner-model-using-flair-df1f9ea9c762 for NER on a custom dataset, and I find that no matter how I change the learning rate, every prediction is unk, and the f1-score is 0.0 on every epoch. I'm thinking there must be something wrong with the formatting of my dataset. Here is what my train set would look like, where I replace my actual labels with Text1 to keep my data anonymous.
`Text1 B-Brand Text1 O Text1 B-MPN Text1 B-Type Text1 B-Model Text1 B-Color Text1 B-Fabric Type
Text1 B-No Tag Text1 B-Brand Text1 B-Color Text1 B-Pattern Text1 B-Fabric Type Text1 B-Model Text1 O Text1 B-Type Text1 B-No Tag Text1 B-Type Text1 O Text1 B-No Tag Text1 B-Type `
And here is the result of loss.tsv trained starting with learning_rate=.001 (I've tried larger and smaller learning_rates already)
EPOCH TIMESTAMP BAD_EPOCHS LEARNING_RATE TRAIN_LOSS DEV_LOSS DEV_PRECISION DEV_RECALL DEV_F1 DEV_ACCURACY 1 21:55:24 0 0.0010 3.6659961510777896 3.160431146621704 0.0 0.0 0.0 0.0 2 21:55:30 0 0.0010 2.658900432190474 2.093571424484253 0.0 0.0 0.0 0.0 3 21:55:36 0 0.0010 1.5765421452425217 0.9758513569831848 0.0 0.0 0.0 0.0 4 21:55:42 0 0.0010 0.5964466308130153 0.21864087879657745 0.0 0.0 0.0 0.0 5 21:55:48 0 0.0010 0.12082597720927506 0.027130696922540665 0.0 0.0 0.0 0.0 6 21:55:55 0 0.0010 0.015038865753739897 0.0025882211048156023 0.0 0.0 0.0 0.0 7 21:56:02 0 0.0010 0.001861507955604636 0.000609234906733036 0.0 0.0 0.0 0.0 8 21:56:09 0 0.0010 0.0007104066469299261 0.0003203396627213806 0.0 0.0 0.0 0.0 9 21:56:16 0 0.0010 0.0004282736406687817 0.0002125622413586825 0.0 0.0 0.0 0.0 10 21:56:23 0 0.0010 0.0003175982157330431 0.00015547996736131608 0.0 0.0 0.0 0.0 11 21:56:30 0 0.0010 0.00023519093161660838 0.00012211497232783586 0.0 0.0 0.0 0.0 12 21:56:37 0 0.0010 0.00018551815456892758 0.00010058629413833842 0.0 0.0 0.0 0.0 13 21:56:42 0 0.0010 0.00016401175303360117 8.437278302153572e-05 0.0 0.0 0.0 0.0 14 21:56:48 0 0.0010 0.00013860434806521084 7.258114055730402e-05 0.0 0.0 0.0 0.0 15 21:56:54 0 0.0010 0.00012990906794919298 6.315676000667736e-05 0.0 0.0 0.0 0.0 16 21:57:00 0 0.0010 0.00010746981776682954 5.596564369625412e-05 0.0 0.0 0.0 0.0 17 21:57:07 0 0.0010 9.767208015885881e-05 5.0248483603354543e-05 0.0 0.0 0.0 0.0 18 21:57:13 0 0.0010 9.089903361855359e-05 4.502263982431032e-05 0.0 0.0 0.0 0.0 19 21:57:20 0 0.0010 8.164969794247736e-05 4.14940805057995e-05 0.0 0.0 0.0 0.0 20 21:57:27 0 0.0010 7.59508407533057e-05 3.7652862374670804e-05 0.0 0.0 0.0 0.0
Notably, the loss significantly decreases, but the f1 score remains the same. If it helps at all, here is also the code I use to run the trainer
`# define columns from flair.data import Corpus from flair.datasets import ColumnCorpus from flair.embeddings import TokenEmbeddings columns = {0 : 'text', 1 : 'ner'}
directory where the data resides
data_folder = './dataset2/'
initializing the corpus
corpus = ColumnCorpus( data_folder, columns, train_file = 'train1.txt', test_file='test1.txt', dev_file = 'dev1.txt')
# tag to predict
tag_type = 'ner'
make tag dictionary from the corpus
label_dictionary = corpus.make_label_dictionary(tag_type) from flair.embeddings import WordEmbeddings, StackedEmbeddings from typing import List embedding_types : List[TokenEmbeddings] = [ WordEmbeddings('glove'),] embeddings : StackedEmbeddings = StackedEmbeddings( embeddings=embedding_types)
print(embeddings) from flair.models import SequenceTagger tagger = SequenceTagger(hidden_size=256, embeddings=embeddings, tag_dictionary=label_dictionary, tag_type=tag_type, use_crf=True) from flair.trainers import ModelTrainer trainer= ModelTrainer(tagger, corpus) print(trainer)
trainer.train('resources/taggers/example-ner', learning_rate=.001, mini_batch_size=32, max_epochs=20) `
Please let me know if there's anything that might stand out as to why the model is just not learning. Thanks
@lukasgarbas can you check and help him?
I can't spot anything that is wrong with your data example or the code. It's hard to tell if your data is formatted properly without seeing some real training examples. I would suggest inspecting and comparing your data to some existing Flair datasets:
I downloaded an existing NER dataset with Flair:
from flair.datasets import NER_ENGLISH_STACKOVERFLOW
example_corpus = NER_ENGLISH_STACKOVERFLOW()
And loaded it with ColumnCorpus as a custom dataset:
from flair.datasets import ColumnCorpus
data_folder = '/root/.flair/datasets/ner_english_stackoverflow'
columns = {0 : 'text', 1 : 'ner'}
corpus = ColumnCorpus(data_folder,
columns,
train_file='train.txt',
dev_file='dev.txt',
test_file='test.txt')
label_type = "ner"
label_dict = corpus.make_label_dictionary(label_type=label_type)
Here are some helpful printouts to see if the data is loaded as expected:
print(corpus) # inspect number of train/dev/test sentences
sentence = corpus.train[2]
print(sentence) # look if a random sentence is tagged correctly
print(corpus.obtain_statistics()) # look at the number of tokens per class
Same as in the blog post, I used GloVe embeddings together with the default SequenceTagger (RNN-CRF model):
from flair.embeddings import WordEmbeddings, StackedEmbeddings
embedding_types = [WordEmbeddings('glove'),] # you can add other embeddings
embeddings = StackedEmbeddings(embeddings=embedding_types)
from flair.models import SequenceTagger
tagger = SequenceTagger(hidden_size=256,
embeddings=embeddings,
tag_dictionary=label_dict,
tag_type=label_type,
use_rnn=True,
use_crf=True,
)
from flair.trainers import ModelTrainer
trainer = ModelTrainer(tagger, corpus)
trainer.train('resources/taggers/example-ner',
learning_rate=0.1,
mini_batch_size=32,
)
These steps work fine for me. If you still need help with this issue, I would need to see a few examples of your data after formatting 🙂
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Hi , I have a similar problem but found no solution > the micro_avg remains at 0:\By class: precision recall f1-score support
PER 0.0000 0.0000 0.0000 15928.0
LOC 0.0000 0.0000 0.0000 14578.0
ORG 0.0000 0.0000 0.0000 6247.0
ANIM 0.0000 0.0000 0.0000 2341.0
PLANT 0.0000 0.0000 0.0000 1608.0
DIS 0.0000 0.0000 0.0000 1057.0
EVE 0.0000 0.0000 0.0000 847.0
FOOD 0.0000 0.0000 0.0000 705.0
TIME 0.0000 0.0000 0.0000 651.0
MEDIA 0.0000 0.0000 0.0000 538.0
CEL 0.0000 0.0000 0.0000 244.0
SUPER 0.0000 0.0000 0.0000 166.0
VEHI 0.0000 0.0000 0.0000 97.0
INST 0.0000 0.0000 0.0000 36.0
... macro avg 0.0000 0.0000 0.0000 45053.0 weighted avg 0.0000 0.0000 0.0000 45053.0
2023-03-29 20:43:45,981 ---------------------------------------------------------------------------------------------------- {'test_score': 0.0, 'dev_score_history': [0.0, 0.0], 'train_loss_history': [2.1314844488807965, 0.9294938301940477], 'dev_loss_history': [0.8666982650756836, 0.6806944012641907]}
I have checked the dataset as instructed above and can see that it is correctly parsed below the statistics: Output exceeds the size limit. Open the full output data in a text editor Corpus: 109754 train + 15680 dev + 31358 test sentences Sentence[36]: "Präsident Kartnig selbst , der nach dem Erringen des ersten Meistertitels in der Vereinsgeschichte unbedingt einen Star präsentieren wollte , gab die Order den Fitnesszustand zu ignorieren , ohne dies mit Trainer Ivica Osim abzusprechen ." → ["Kartnig"/PER, "Ivica Osim"/PER] { "TRAIN": { "dataset": "TRAIN", "total_number_of_documents": 109754, "number_of_documents_per_class": { "LOC": 51012, "PER": 55420, "ORG": 21863, "ANIM": 8019, "TIME": 2368, "DIS": 3590, "PLANT": 5447, "CEL": 1010, "EVE": 2788, "FOOD": 2554, "SUPER": 601, "VEHI": 364, "MEDIA": 1991, "BIO": 23, "INST": 87, "PHY": 23 }, "number_of_tokens_per_tag": {}, ... "avg": 17.662244897959184 } } }
**I am trying the multinerd dataset a sample: **
Nur O mit O Hilfe O von O Aktionen O in O den O neutral O gebliebenen O Ländern O Schweiz B-LOC , O Niederlande B-LOC und O Schweden B-LOC konnte O der O Schreckenswinter O 1918 O / O 19 O überstanden O werden O . O
Nach O einer O Kabinettsumbildung O war O er O zuletzt O als O Nachfolger O von O Sunthorn B-PER Hongladarom I-PER vom O 10 O . O
In O den O folgenden O Jahrzehnten O wurde O ein O Teil O der O Öleinnahmen O dadurch O an O ärmere O arabische O Staaten O in O Asien B-LOC und O Afrika B-LOC weitergegeben O . O
and here is the code I use:\
from flair.data import Corpus from flair.datasets import ColumnCorpus from flair.embeddings import TransformerWordEmbeddings from flair.models import SequenceTagger from flair.trainers import ModelTrainer
columns = {0: "text", 1: "ner"}
Specify the folder containing the dataset files
data_folder = "/home/ed/Desktop/ner/multinerd_flert"
corpus: Corpus = ColumnCorpus(data_folder, columns, train_file="train.tsv", test_file="test.tsv", dev_file="valid.tsv")
Choose the pre-trained transformer model for embeddings
embedding = TransformerWordEmbeddings( model='xlm-roberta-large', layers="-1", subtoken_pooling="first", fine_tune=True, use_context=True, )
Define the sequence tagger using the FLERT model
tagger = SequenceTagger(hidden_size=256, embeddings=embedding, tag_dictionary=corpus.make_label_dictionary("ner",add_unk=False), tag_type="ner", tag_format="BIOES",use_crf=False, use_rnn=False, reproject_embeddings=False,)
Train the model
trainer = ModelTrainer(tagger, corpus) trainer.train( 'resources/taggers/ner-english-large', learning_rate=5.0e-6, mini_batch_size=4, max_epochs=10, mini_batch_chunk_size=1, )
Any help is very welcome, thank you !