flair Issue with special unicode characters during inference of finetuned models flair

Issue with special unicode characters during inference of finetuned models flair

Open Capt4in-Levi opened this issue 1 year ago • 1 comments

Describe the bug

We are using fine-tuned XLMR model for NER of our use case hosted through FastAPI in Python. We are loading it through the flair framework and in some cases where the input contains special characters like '\u200d', '\x9f', the inference is failing and causing the CPU utilization of the machine where this app is running to drop to zero and it is not taking any more requests for inference.

To Reproduce

from flair.models import SequenceTagger

input_sentences =  ['This is an example \u200d sentence']
flair_trained_model = SequenceTagger.load(
                'path to the pretrained model')
flair_trained_model.predict(input_sentences, mini_batch_size=32)

Expected behavior

It should ideally infer without any errors. In unsupported cases, it should at least give a proper error response and the app running this shouldn't affect further requests as the CPU utilization is dropping to zero.

Logs and Stack traces

File "/opt/deployment/flair-ner/src/flair_ner/flair_model_prediction.py", line 59, in flair_inference
    self.multi_flair_model.predict(metadata_sentences, mini_batch_size=self.batch_size)
  File "/usr/local/lib/python3.6/site-packages/flair/models/sequence_tagger_model.py", line 379, in predict
    feature = self.forward(batch)
  File "/usr/local/lib/python3.6/site-packages/flair/models/sequence_tagger_model.py", line 672, in forward
    self.embeddings.embedding_length,
RuntimeError: shape '[68, 13, 768]' is invalid for input of size 677376

Screenshots

No response

Additional Context

No response

Environment

Versions:

Flair

0.12.2

Pytorch

2.0.1+cpu

Transformers

4.30.2

GPU

False

Nov 27 '23 13:11 Capt4in-Levi

Hi @Capt4in-Levi upgrading flair to the newest version should fix the problem for you

Nov 28 '23 09:11 helpmefindaname

flair flair copied to clipboard

Issue with special unicode characters during inference of finetuned models flair

Describe the bug

To Reproduce

Expected behavior

Logs and Stack traces

Screenshots

Additional Context

Environment

Versions:

Flair

Pytorch

Transformers

GPU

flair
flair copied to clipboard