flair
flair copied to clipboard
[Question]: CSVClassificationCorpus and tagger
Question
I trained a custom model with
tag_type='label'
column_name_map = {0: 'text', 1: tag_type}
corpus = CSVClassificationCorpus("input/test",train_file='text.txt',column_name_map=column_name_map, skip_header=True,delimiter=',',label_type=tag_type)
tag_dictionary = corpus.make_label_dictionary(label_type=tag_type, add_unk = True)
print(tag_dictionary)
char_embeddings = CharacterEmbeddings()
embeddings = StackedEmbeddings([char_embeddings])
tagger = SequenceTagger(hidden_size=256,
embeddings=embeddings,
tag_dictionary=tag_dictionary,
tag_type=tag_type,
use_crf=True)
trainer = ModelTrainer(tagger, corpus)
trainer.train('resources/taggers/' + model_name,
learning_rate=0.1,
mini_batch_size=32,
max_epochs=num_epochs)
model_path = 'models/flair/' + model_name
tagger.save(model_path)
When I try to tag some sentence, I get []
def tag_text_with_ner(model_path, text):
tagger = SequenceTagger.load(model_path)
sentence = Sentence(text)
tagger.predict(sentence)
tagged_entities = []
for entity in sentence.get_spans('ner'):
tagged_entities.append((entity.text, entity.tag, entity.score))
return tagged_entities
tagged_entities = tag_text_with_ner(model_path, text)
print(tagged_entities)
The prompt is: 2024-01-11 14:34:55,577 SequenceTagger predicts: Dictionary with 15 tags: <unk>, ...
[]
I have changed 'ner'
to 'label'
-- no difference. It worked fine with ColumnCorpus
in the training but I need a CSV for training, not BIO.
I guess the issue is sequence labeling vs. text classification...yet, I was wondering if NER training can be done via a CSV file as well instead of BIO.