flair [Question]: Low and different results when reload the final

[Question]: Low and different results when reload the final_model.pt

Open Tinarights opened this issue 1 year ago • 1 comments

Question

I have used this code to train ner model.

tagger : SequenceTagger = SequenceTagger(hidden_size=128,
                                        embeddings=embeddings,
                                        tag_dictionary=tag_dictionary,
                                        tag_type=tag_type,
                                        tag_format="BIO",
                                        use_rnn=True,
                                        use_crf=True)


trainer : ModelTrainer = ModelTrainer(tagger,corpus )
trainer.train(f'train/{folder}/model3',
             learning_rate=0.01,
             min_learning_rate= 0.0001,
             mini_batch_size=64,
             embeddings_storage_mode='none',
             max_epochs=80,
             patience=3,
             train_with_dev=True,
             )

I got this result after the training

2024-01-19 03:28:19,510 Testing using last state of model ...
2024-01-19 03:28:34,248
Results:
- F-score (micro) 0.9024
- F-score (macro) 0.9026
- Accuracy 0.8443


By class:
              precision    recall  f1-score   support

           I     0.8624    0.9402    0.8996	 1087
           B     0.8879    0.9240    0.9056	  960

   micro avg     0.8741    0.9326    0.9024	 2047
   macro avg     0.8752    0.9321    0.9026	 2047
weighted avg     0.8744    0.9326    0.9024	 2047

However, when I load the model from the file, I get very very low results. Any explanation please:

tagger = SequenceTagger.load("train/NCBI-disease/model3/final-model.pt")
columns = {0 : 'text', 1 : 'ner', 2:'pos'}
corpus : Corpus = ColumnCorpus(data_folder, columns, test_file = 'test.tsv',)
print(tagger.evaluate(corpus.test,'ner').detailed_results)


Results:
- F-score (micro) 0.0125
- F-score (macro) 0.0069
- Accuracy 0.0063

By class:
              precision    recall  f1-score   support

           I     0.0077    0.0727    0.0139      1087
           B     0.0000    0.0000    0.0000       960

   micro avg     0.0075    0.0386    0.0125      2047
   macro avg     0.0038    0.0363    0.0069      2047
weighted avg     0.0041    0.0386    0.0074      2047


#Check dev
print(tagger.evaluate(corpus.dev,'ner').detailed_results)


Results:
- F-score (micro) 0.0182
- F-score (macro) 0.01
- Accuracy 0.0092

By class:
              precision    recall  f1-score   support

           I     0.0110    0.1037    0.0199      1090
           B     0.0000    0.0000    0.0000       787

   micro avg     0.0107    0.0602    0.0182      1877
   macro avg     0.0055    0.0518    0.0100      1877
weighted avg     0.0064    0.0602    0.0116      1877

Could you please help me ASAP.

Also, Similar issue when set train_with_dev=False, and reload 'best-model.pt'

Jan 21 '24 12:01 Tinarights

Hi @Tinarights I did not manage to reproduce this with the information provided. However I noticed that the classes are called B and I. Am I right to assume, that that are not your intended class names? (E.g. you didn't set the token-labels to be [B-B, B-I, I-B, I-I, O]) If you want to detect an entities with a single label, you should still provide a label name for it, e.g. using [B-Entity, I-Entity, O] as the possibe token-labels

Feb 02 '24 21:02 helpmefindaname

flair flair copied to clipboard

[Question]: Low and different results when reload the final_model.pt

Question

flair
flair copied to clipboard