flair icon indicating copy to clipboard operation
flair copied to clipboard

ModelTrainer fails when training with tensorboard option

Open GuiGel opened this issue 2 years ago • 0 comments

Describe the bug

ModelTrainer.train method fails with error UnboundLocalError: local variable 'train_part_eval_result' referenced before assignment when used with tensorboard.

This error is simply due to a forgotten tab in the flair.trainers.trainer module line 614 and can be easily corrected.

To Reproduce

from pathlib import Path

from flair.datasets import CONLL_03
from flair.embeddings import WordEmbeddings, FlairEmbeddings, StackedEmbeddings
from flair.models import SequenceTagger
from flair.trainers import ModelTrainer

# 1. get the corpus
corpus = CONLL_03()
corpus.downsample(0.05)
print(corpus)

# 2. what label do we want to predict?
label_type = 'ner'

# 3. make the label dictionary from the corpus
label_dict = corpus.make_label_dictionary(label_type=label_type)
print(label_dict)

# 4. initialize embedding stack with Flair and GloVe
embedding_types = [
    FlairEmbeddings('news-forward'),
    FlairEmbeddings('news-backward'),
]

embeddings = StackedEmbeddings(embeddings=embedding_types)

# 5. initialize sequence tagger
tagger = SequenceTagger(hidden_size=256,
                        embeddings=embeddings,
                        tag_dictionary=label_dict,
                        tag_type=label_type,
                        use_crf=True)

# 6. initialize trainer
trainer = ModelTrainer(tagger, corpus)

# 7. start training
trainer.train(
    Path(__file__).parent / "bug",
    learning_rate=0.1,
    mini_batch_size=32,
    max_epochs=150,
    use_tensorboard=True,
    tensorboard_log_dir=Path(__file__).parent / "bug",
    metrics_for_tensorboard=[("macro avg", 'f1-score')],
)

Expected behavior

No error appears.

Screenshots

$ python experiments/bug.py
2022-06-30 09:41:49,513 Reading data from /path/to/datasets/conll_03
2022-06-30 09:41:49,514 Train: /path/to/datasets/conll_03/train.txt
2022-06-30 09:41:49,514 Dev: /path/to/datasets/conll_03/dev.txt
2022-06-30 09:41:49,514 Test: /path/to/datasets/conll_03/test.txt
Corpus: 749 train + 173 dev + 184 test sentences
2022-06-30 09:41:57,585 Computing label dictionary. Progress:
749it [00:00, 43064.79it/s]
2022-06-30 09:41:57,604 Dictionary created for label 'ner' with 5 values: LOC (seen 334 times), ORG (seen 301 times), PER (seen 277 times), MISC (seen 182 times)
Dictionary with 5 tags: <unk>, LOC, ORG, PER, MISC
2022-06-30 09:41:59,732 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-ORG, B-ORG, E-ORG, I-ORG, S-PER, B-PER, E-PER, I-PER, S-MISC, B-MISC, E-MISC, I-MISC
2022-06-30 09:41:59,981 tensorboard logging path is experiments/bug
2022-06-30 09:41:59,982 ----------------------------------------------------------------------------------------------------
2022-06-30 09:41:59,982 Model: "SequenceTagger(
  (embeddings): StackedEmbeddings(
    (list_embedding_0): FlairEmbeddings(
      (lm): LanguageModel(
        (drop): Dropout(p=0.05, inplace=False)
        (encoder): Embedding(300, 100)
        (rnn): LSTM(100, 2048)
        (decoder): Linear(in_features=2048, out_features=300, bias=True)
      )
    )
    (list_embedding_1): FlairEmbeddings(
      (lm): LanguageModel(
        (drop): Dropout(p=0.05, inplace=False)
        (encoder): Embedding(300, 100)
        (rnn): LSTM(100, 2048)
        (decoder): Linear(in_features=2048, out_features=300, bias=True)
      )
    )
  )
  (word_dropout): WordDropout(p=0.05)
  (locked_dropout): LockedDropout(p=0.5)
  (embedding2nn): Linear(in_features=4096, out_features=4096, bias=True)
  (rnn): LSTM(4096, 256, batch_first=True, bidirectional=True)
  (linear): Linear(in_features=512, out_features=19, bias=True)
  (loss_function): ViterbiLoss()
  (crf): CRF()
)"
2022-06-30 09:41:59,983 ----------------------------------------------------------------------------------------------------
2022-06-30 09:41:59,983 Corpus: "Corpus: 749 train + 173 dev + 184 test sentences"
2022-06-30 09:41:59,983 ----------------------------------------------------------------------------------------------------
2022-06-30 09:41:59,983 Parameters:
2022-06-30 09:41:59,983  - learning_rate: "0.100000"
2022-06-30 09:41:59,983  - mini_batch_size: "32"
2022-06-30 09:41:59,983  - patience: "3"
2022-06-30 09:41:59,983  - anneal_factor: "0.5"
2022-06-30 09:41:59,983  - max_epochs: "150"
2022-06-30 09:41:59,983  - shuffle: "True"
2022-06-30 09:41:59,983  - train_with_dev: "False"
2022-06-30 09:41:59,983  - batch_growth_annealing: "False"
2022-06-30 09:41:59,983 ----------------------------------------------------------------------------------------------------
2022-06-30 09:41:59,983 Model training base path: "experiments/bug"
2022-06-30 09:41:59,984 ----------------------------------------------------------------------------------------------------
2022-06-30 09:41:59,984 Device: cuda:0
2022-06-30 09:41:59,984 ----------------------------------------------------------------------------------------------------
2022-06-30 09:41:59,984 Embeddings storage mode: cpu
2022-06-30 09:41:59,984 ----------------------------------------------------------------------------------------------------
2022-06-30 09:42:00,560 epoch 1 - iter 2/24 - loss 4.06095307 - samples/sec: 111.28 - lr: 0.100000
2022-06-30 09:42:01,133 epoch 1 - iter 4/24 - loss 3.38150115 - samples/sec: 111.65 - lr: 0.100000
2022-06-30 09:42:01,717 epoch 1 - iter 6/24 - loss 2.64457018 - samples/sec: 109.78 - lr: 0.100000
2022-06-30 09:42:02,213 epoch 1 - iter 8/24 - loss 2.29768349 - samples/sec: 128.95 - lr: 0.100000
2022-06-30 09:42:02,580 epoch 1 - iter 10/24 - loss 2.02568397 - samples/sec: 174.88 - lr: 0.100000
2022-06-30 09:42:03,167 epoch 1 - iter 12/24 - loss 1.79615879 - samples/sec: 109.02 - lr: 0.100000
2022-06-30 09:42:03,745 epoch 1 - iter 14/24 - loss 1.62863061 - samples/sec: 110.84 - lr: 0.100000
2022-06-30 09:42:04,381 epoch 1 - iter 16/24 - loss 1.46143820 - samples/sec: 100.69 - lr: 0.100000
2022-06-30 09:42:04,896 epoch 1 - iter 18/24 - loss 1.37829819 - samples/sec: 124.43 - lr: 0.100000
2022-06-30 09:42:05,578 epoch 1 - iter 20/24 - loss 1.24339454 - samples/sec: 93.86 - lr: 0.100000
2022-06-30 09:42:06,198 epoch 1 - iter 22/24 - loss 1.17003827 - samples/sec: 103.35 - lr: 0.100000
2022-06-30 09:42:06,849 epoch 1 - iter 24/24 - loss 1.12595830 - samples/sec: 98.26 - lr: 0.100000
2022-06-30 09:42:06,850 ----------------------------------------------------------------------------------------------------
2022-06-30 09:42:06,850 EPOCH 1 done: loss 1.1260 - lr 0.100000
Traceback (most recent call last):
  File "experiments/bug.py", line 39, in <module>
    trainer.train(
  File "/path/to/project/.venv/lib/python3.8/site-packages/flair/trainers/trainer.py", line 618, in train
    train_part_eval_result.classification_report[metric_class_avg_type][metric_type],
UnboundLocalError: local variable 'train_part_eval_result' referenced before assignment

Environment (please complete the following information):

  • OS: Linux
  • Version :Flair 0.11.3

GuiGel avatar Jun 30 '22 08:06 GuiGel