I'm running the code from https://medium.com/thecyphy/training-custom-ner-model-using-flair-df1f9ea9c762 for NER on a custom dataset, and I find that no matter how I change the learning rate, every prediction is unk, and the f1-score is 0.0 on every epoch. I'm thinking there must be something wrong with the formatting of my dataset. Here is what my train set would look like, where I replace my actual labels with Text1 to keep my data anonymous.

`Text1 B-Brand Text1 O Text1 B-MPN Text1 B-Type Text1 B-Model Text1 B-Color Text1 B-Fabric Type

Text1 B-No Tag Text1 B-Brand Text1 B-Color Text1 B-Pattern Text1 B-Fabric Type Text1 B-Model Text1 O Text1 B-Type Text1 B-No Tag Text1 B-Type Text1 O Text1 B-No Tag Text1 B-Type `

And here is the result of loss.tsv trained starting with learning_rate=.001 (I've tried larger and smaller learning_rates already)

EPOCH TIMESTAMP BAD_EPOCHS LEARNING_RATE TRAIN_LOSS DEV_LOSS DEV_PRECISION DEV_RECALL DEV_F1 DEV_ACCURACY 1 21:55:24 0 0.0010 3.6659961510777896 3.160431146621704 0.0 0.0 0.0 0.0 2 21:55:30 0 0.0010 2.658900432190474 2.093571424484253 0.0 0.0 0.0 0.0 3 21:55:36 0 0.0010 1.5765421452425217 0.9758513569831848 0.0 0.0 0.0 0.0 4 21:55:42 0 0.0010 0.5964466308130153 0.21864087879657745 0.0 0.0 0.0 0.0 5 21:55:48 0 0.0010 0.12082597720927506 0.027130696922540665 0.0 0.0 0.0 0.0 6 21:55:55 0 0.0010 0.015038865753739897 0.0025882211048156023 0.0 0.0 0.0 0.0 7 21:56:02 0 0.0010 0.001861507955604636 0.000609234906733036 0.0 0.0 0.0 0.0 8 21:56:09 0 0.0010 0.0007104066469299261 0.0003203396627213806 0.0 0.0 0.0 0.0 9 21:56:16 0 0.0010 0.0004282736406687817 0.0002125622413586825 0.0 0.0 0.0 0.0 10 21:56:23 0 0.0010 0.0003175982157330431 0.00015547996736131608 0.0 0.0 0.0 0.0 11 21:56:30 0 0.0010 0.00023519093161660838 0.00012211497232783586 0.0 0.0 0.0 0.0 12 21:56:37 0 0.0010 0.00018551815456892758 0.00010058629413833842 0.0 0.0 0.0 0.0 13 21:56:42 0 0.0010 0.00016401175303360117 8.437278302153572e-05 0.0 0.0 0.0 0.0 14 21:56:48 0 0.0010 0.00013860434806521084 7.258114055730402e-05 0.0 0.0 0.0 0.0 15 21:56:54 0 0.0010 0.00012990906794919298 6.315676000667736e-05 0.0 0.0 0.0 0.0 16 21:57:00 0 0.0010 0.00010746981776682954 5.596564369625412e-05 0.0 0.0 0.0 0.0 17 21:57:07 0 0.0010 9.767208015885881e-05 5.0248483603354543e-05 0.0 0.0 0.0 0.0 18 21:57:13 0 0.0010 9.089903361855359e-05 4.502263982431032e-05 0.0 0.0 0.0 0.0 19 21:57:20 0 0.0010 8.164969794247736e-05 4.14940805057995e-05 0.0 0.0 0.0 0.0 20 21:57:27 0 0.0010 7.59508407533057e-05 3.7652862374670804e-05 0.0 0.0 0.0 0.0

Notably, the loss significantly decreases, but the f1 score remains the same. If it helps at all, here is also the code I use to run the trainer

`# define columns from flair.data import Corpus from flair.datasets import ColumnCorpus from flair.embeddings import TokenEmbeddings columns = {0 : 'text', 1 : 'ner'}

directory where the data resides

data_folder = './dataset2/'

initializing the corpus

corpus = ColumnCorpus( data_folder, columns, train_file = 'train1.txt', test_file='test1.txt', dev_file = 'dev1.txt')

                          # tag to predict

tag_type = 'ner'

make tag dictionary from the corpus

label_dictionary = corpus.make_label_dictionary(tag_type) from flair.embeddings import WordEmbeddings, StackedEmbeddings from typing import List embedding_types : List[TokenEmbeddings] = [ WordEmbeddings('glove'),] embeddings : StackedEmbeddings = StackedEmbeddings( embeddings=embedding_types)

print(embeddings) from flair.models import SequenceTagger tagger = SequenceTagger(hidden_size=256, embeddings=embeddings, tag_dictionary=label_dictionary, tag_type=tag_type, use_crf=True) from flair.trainers import ModelTrainer trainer= ModelTrainer(tagger, corpus) print(trainer)

trainer.train('resources/taggers/example-ner', learning_rate=.001, mini_batch_size=32, max_epochs=20) `

Please let me know if there's anything that might stand out as to why the model is just not learning. Thanks

Sep 11 '22 03:09 ModeEric

@lukasgarbas can you check and help him?

Sep 12 '22 10:09 alanakbik

I can't spot anything that is wrong with your data example or the code. It's hard to tell if your data is formatted properly without seeing some real training examples. I would suggest inspecting and comparing your data to some existing Flair datasets:

I downloaded an existing NER dataset with Flair:

from flair.datasets import NER_ENGLISH_STACKOVERFLOW

example_corpus = NER_ENGLISH_STACKOVERFLOW()

And loaded it with ColumnCorpus as a custom dataset:

from flair.datasets import ColumnCorpus

data_folder = '/root/.flair/datasets/ner_english_stackoverflow'
columns = {0 : 'text', 1 : 'ner'}

corpus = ColumnCorpus(data_folder,
                      columns,
                      train_file='train.txt',
                      dev_file='dev.txt',
                      test_file='test.txt')

label_type = "ner"
label_dict = corpus.make_label_dictionary(label_type=label_type)

Here are some helpful printouts to see if the data is loaded as expected:

print(corpus) # inspect number of train/dev/test sentences

sentence = corpus.train[2]
print(sentence) # look if a random sentence is tagged correctly

print(corpus.obtain_statistics()) # look at the number of tokens per class

Same as in the blog post, I used GloVe embeddings together with the default SequenceTagger (RNN-CRF model):

from flair.embeddings import WordEmbeddings, StackedEmbeddings

embedding_types = [WordEmbeddings('glove'),] # you can add other embeddings

embeddings = StackedEmbeddings(embeddings=embedding_types)

from flair.models import SequenceTagger

tagger = SequenceTagger(hidden_size=256,
                        embeddings=embeddings,
                        tag_dictionary=label_dict,
                        tag_type=label_type,
                        use_rnn=True,
                        use_crf=True,
                        )

from flair.trainers import ModelTrainer

trainer = ModelTrainer(tagger, corpus)

trainer.train('resources/taggers/example-ner',
              learning_rate=0.1,
              mini_batch_size=32,
              )

These steps work fine for me. If you still need help with this issue, I would need to see a few examples of your data after formatting 🙂

Sep 20 '22 15:09 lukasgarbas

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Jan 21 '23 05:01 stale[bot]

Hi , I have a similar problem but found no solution > the micro_avg remains at 0:\By class: precision recall f1-score support

     PER     0.0000    0.0000    0.0000   15928.0
     LOC     0.0000    0.0000    0.0000   14578.0
     ORG     0.0000    0.0000    0.0000    6247.0
    ANIM     0.0000    0.0000    0.0000    2341.0
   PLANT     0.0000    0.0000    0.0000    1608.0
     DIS     0.0000    0.0000    0.0000    1057.0
     EVE     0.0000    0.0000    0.0000     847.0
    FOOD     0.0000    0.0000    0.0000     705.0
    TIME     0.0000    0.0000    0.0000     651.0
   MEDIA     0.0000    0.0000    0.0000     538.0
     CEL     0.0000    0.0000    0.0000     244.0
   SUPER     0.0000    0.0000    0.0000     166.0
    VEHI     0.0000    0.0000    0.0000      97.0
    INST     0.0000    0.0000    0.0000      36.0

... macro avg 0.0000 0.0000 0.0000 45053.0 weighted avg 0.0000 0.0000 0.0000 45053.0

2023-03-29 20:43:45,981 ---------------------------------------------------------------------------------------------------- {'test_score': 0.0, 'dev_score_history': [0.0, 0.0], 'train_loss_history': [2.1314844488807965, 0.9294938301940477], 'dev_loss_history': [0.8666982650756836, 0.6806944012641907]}

I have checked the dataset as instructed above and can see that it is correctly parsed below the statistics: Output exceeds the size limit. Open the full output data in a text editor Corpus: 109754 train + 15680 dev + 31358 test sentences Sentence[36]: "Präsident Kartnig selbst , der nach dem Erringen des ersten Meistertitels in der Vereinsgeschichte unbedingt einen Star präsentieren wollte , gab die Order den Fitnesszustand zu ignorieren , ohne dies mit Trainer Ivica Osim abzusprechen ." → ["Kartnig"/PER, "Ivica Osim"/PER] { "TRAIN": { "dataset": "TRAIN", "total_number_of_documents": 109754, "number_of_documents_per_class": { "LOC": 51012, "PER": 55420, "ORG": 21863, "ANIM": 8019, "TIME": 2368, "DIS": 3590, "PLANT": 5447, "CEL": 1010, "EVE": 2788, "FOOD": 2554, "SUPER": 601, "VEHI": 364, "MEDIA": 1991, "BIO": 23, "INST": 87, "PHY": 23 }, "number_of_tokens_per_tag": {}, ... "avg": 17.662244897959184 } } }

**I am trying the multinerd dataset a sample: **

Nur O mit O Hilfe O von O Aktionen O in O den O neutral O gebliebenen O Ländern O Schweiz B-LOC , O Niederlande B-LOC und O Schweden B-LOC konnte O der O Schreckenswinter O 1918 O / O 19 O überstanden O werden O . O

Nach O einer O Kabinettsumbildung O war O er O zuletzt O als O Nachfolger O von O Sunthorn B-PER Hongladarom I-PER vom O 10 O . O

In O den O folgenden O Jahrzehnten O wurde O ein O Teil O der O Öleinnahmen O dadurch O an O ärmere O arabische O Staaten O in O Asien B-LOC und O Afrika B-LOC weitergegeben O . O

and here is the code I use:\

from flair.data import Corpus from flair.datasets import ColumnCorpus from flair.embeddings import TransformerWordEmbeddings from flair.models import SequenceTagger from flair.trainers import ModelTrainer

columns = {0: "text", 1: "ner"}

Specify the folder containing the dataset files

data_folder = "/home/ed/Desktop/ner/multinerd_flert"

corpus: Corpus = ColumnCorpus(data_folder, columns, train_file="train.tsv", test_file="test.tsv", dev_file="valid.tsv")

Choose the pre-trained transformer model for embeddings

embedding = TransformerWordEmbeddings( model='xlm-roberta-large', layers="-1", subtoken_pooling="first", fine_tune=True, use_context=True, )

Define the sequence tagger using the FLERT model

tagger = SequenceTagger(hidden_size=256, embeddings=embedding, tag_dictionary=corpus.make_label_dictionary("ner",add_unk=False), tag_type="ner", tag_format="BIOES",use_crf=False, use_rnn=False, reproject_embeddings=False,)

Train the model

trainer = ModelTrainer(tagger, corpus) trainer.train( 'resources/taggers/ner-english-large', learning_rate=5.0e-6, mini_batch_size=4, max_epochs=10, mini_batch_chunk_size=1, )

Any help is very welcome, thank you !

Mar 30 '23 10:03 ronin2304

All predictions are <unk>

directory where the data resides

initializing the corpus

make tag dictionary from the corpus

Specify the folder containing the dataset files

Choose the pre-trained transformer model for embeddings

Define the sequence tagger using the FLERT model

Train the model