optimum icon indicating copy to clipboard operation
optimum copied to clipboard

Missing `token_type_ids` when different tokenizer and model

Open ZeusFSX opened this issue 11 months ago • 0 comments

System Info

optimum==1.17.1

Who can help?

@philschmid @michaelbenayoun @JingyaHuang @echarlaix

Information

  • [ ] The official example scripts
  • [X] My own modified scripts

Tasks

  • [ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [X] My own task or dataset (give details below)

Reproduction (minimal, reproducible, runnable)

Hi I faced with a problem described here with missing 'token_type_ids'.

The root of problem is when we used tokenizer different from model, for BERT we don't need attention_mask, but we need token_type_ids. For example intfloat/multilingual-e5-small is BertModel architecture, but use XLMRobertaTokenizer, and we faced with problem missing 'token_type_ids' when we convert it onnx or openvino and try to run it with optimum.

For example: I finetuned intfloat/multilingual-e5-small for token-classification task and its works fine with transformer library.

But when I try to use pipeline with ORTModelForTokenClassification, token_type_ids do not add in model input because tokenizer return attention_mask instead token_type_ids.

And problem in code is here

But when I added it manually, everything works: This fix will work:

tokenizer = XLMRobertaTokenizerFast.from_pretrained('models/small')
model = ORTModelForTokenClassification.from_pretrained('models/small-onnx/')
inputs = tokenizer('some text', return_tensors='pt')
inputs['token_type_ids'] = inputs['attention_mask']
model(**inputs)

Maybe we should check the inputs of model and decide add token_type_ids or not, based on the input of model, not tokenizer output. Or we can do like in transformer lib. Here the link

They mask all token_type_ids with zeroes if they not preserve.

Also this problem is refer for openvino OVModelForTokenClassification.

Expected behavior

Forward will be the same like in transformer lib

ZeusFSX avatar Mar 15 '24 14:03 ZeusFSX