ONNX files for T5 model with text2text-generation-with-past task do not work
System Info
Reproduced on Mac, Python 3.11 and Google Colab / Python 3.10
optimum==1.14.0
Who can help?
@ michaelbenayoun
Information
- [ ] The official example scripts
- [X] My own modified scripts
Tasks
- [ ] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [X] My own task or dataset (give details below)
Reproduction (minimal, reproducible, runnable)
Full colab here
When I export the model with past, it does not work:
!optimum-cli export onnx \
--model jbochi/madlad400-3b-mt \
--task text2text-generation-with-past \
--optimize O3 \
onnx/
from optimum.onnxruntime import ORTModelForSeq2SeqLM
from transformers import T5Tokenizer
model = ORTModelForSeq2SeqLM.from_pretrained('./onnx', device="auto")
tokenizer = T5Tokenizer.from_pretrained('jbochi/madlad400-3b-mt')
text = "<2pt> I love pizza!"
inputs = tokenizer(text, return_tensors="pt", device=model.device)
outputs = model.generate(**inputs)
tokenizer.decode(outputs[0], skip_special_tokens=True)
It raises the following error:
---------------------------------------------------------------------------
InvalidArgument Traceback (most recent call last)
[<ipython-input-14-be8c5bdea41e>](https://localhost:8080/#) in <cell line: 3>()
1 text = "<2pt> I love pizza!"
2 inputs = tokenizer(text, return_tensors="pt", device=model.device)
----> 3 outputs = model.generate(**inputs)
4 tokenizer.decode(outputs[0], skip_special_tokens=True)
7 frames
[/usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py](https://localhost:8080/#) in run(self, output_names, input_feed, run_options)
218 output_names = [output.name for output in self._outputs_meta]
219 try:
--> 220 return self._sess.run(output_names, input_feed, run_options)
221 except C.EPFail as err:
222 if self._enable_fallback:
InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Got invalid dimensions for input: past_key_values.9.encoder.key for the following indices
index: 3 Got: 64 Expected: 128
Please fix either the inputs or the model.
Expected behavior
Without past / cache, it works:
If I convert this T5 model with no past / cache, it works:
!optimum-cli export onnx \
--model jbochi/madlad400-3b-mt \
--task text2text-generation \
--optimize O3 \
onnx-no-past/
from optimum.onnxruntime import ORTModelForSeq2SeqLM
from transformers import T5Tokenizer
model = ORTModelForSeq2SeqLM.from_pretrained('./onnx-no-past', use_cache=False)
tokenizer = T5Tokenizer.from_pretrained('jbochi/madlad400-3b-mt')
text = "<2pt> I love pizza!"
inputs = tokenizer(text, return_tensors="pt", device=model.device)
outputs = model.generate(**inputs)
tokenizer.decode(outputs[0], skip_special_tokens=True)
# Eu amo pizza!
Thank you!
Met the same issue, any progress?
Seems it is related with this PR. https://github.com/huggingface/optimum/pull/1257
It worked when I removed the merged decoder.
Hi, I'm facing the same issue. It doesn't work when I remove the merged decoder. I used the same command as @jbochi
optimum-cli export onnx \
--model madlad400-3b-mt \
--task text2text-generation-with-past \
--optimize O3 \
onnx/
My onnx dir has decoder versions
decoder_model.onnx
decoder_model.onnx.data
decoder_model.onnx_data
decoder_with_past_model.onnx
decoder_with_past_model.onnx.data
decoder_with_past_model.onnx_data
decoder_model_merged.onnx
decoder_model_merged.onnx_data
I checked the past_key_values field of the original model and also of the onnx model (from the model graph) and confirmed they both expect 128 size in the third dimension (seq len dimension) in the attention KV matrix.
I'm running optimum[onnxruntime-gpu]==1.16.2. The model is the same madlad t5 exported by jbochi above.
As with jbochi it works If I load the model using model = ORTModelForSeq2SeqLM.from_pretrained(model_name, provider="CUDAExecutionProvider", task="text2text-generation", use_cache=False, use_io_binding=False). But I want to load the version with the KV cache to get some speedup.