optimum icon indicating copy to clipboard operation
optimum copied to clipboard

Running inference pipeline with Starcoderbase model with ONNX Optimization crashes

Open BBerabi opened this issue 2 years ago • 1 comments

System Info

Optimum Version: 1.13.2
Platform: Ubuntu 22.04
Python Version: 3.10.2
Transformers Version: 4.34

Who can help?

@JingyaHuang @fxmarty @michaelbenayoun

Running inference pipeline with onnx optimized starcoder model or any model with multi query attention crashes. I have written very detailed comments in PR https://github.com/huggingface/optimum/pull/1042 which was meant to add support for this kind of models but I think the code is not safe/robust enough. I highlighted all of the potentially problematic cases and bugs. Please have a look at the comments there.

Information

  • [ ] The official example scripts
  • [X] My own modified scripts

Tasks

  • [ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [x] My own task or dataset (give details below)

Reproduction (minimal, reproducible, runnable)

import torch
import transformers
import onnx.runtime as onnx


my_checkpoint = "path_to_checkpoint"
device = torch.device("cuda:0")
model = onnx.ORTModelForCausalLM.from_pretrained(
        my_checkpoint,
        provider="CUDAExecutionProvider",
        export=True,
        trust_remote_code=True
)
tokenizer = transformers.AutoTokenizer.from_pretrained(my_checkpoint)
pipeline = transformers.TextGenerationPipeline(tokenizer=tokenizer, model=model, device=device)
input = "some_input_text_to_the_model"
pipeline(input, num_workers=0, batch_size=1, num_return_sequences=5, num_beams=5)

Expected behavior

Successful generation of predictions after call to pipeline. Instead I get the error

Tuple object has no attribute Size

BBerabi avatar Oct 23 '23 14:10 BBerabi

I created a git issue on transformers side with more stack trace. There are some speculation around root cause.

lidingsnyk avatar Feb 15 '24 11:02 lidingsnyk

@BBerabi @lidingsnyk Thank you for the details & apology for the late reply, it should be fixed by https://github.com/huggingface/optimum/pull/1722

import torch
import transformers
from optimum.onnxruntime import ORTModelForCausalLM

my_checkpoint = "hf-internal-testing/tiny-random-GPTBigCodeModel"
model = ORTModelForCausalLM.from_pretrained(
        my_checkpoint,
        export=True,
)
tokenizer = transformers.AutoTokenizer.from_pretrained(my_checkpoint)
pipeline = transformers.TextGenerationPipeline(tokenizer=tokenizer, model=model)
input = "some_input"
pipeline(input, num_workers=0, batch_size=1, num_return_sequences=5, num_beams=5, max_new_tokens=5)

now works as expected (dummy model) with the above fix.

fxmarty avatar Feb 26 '24 14:02 fxmarty

@fxmarty Thanks! Would love to verify this once it's merged. (I see it's possible to install from the source)

lidingsnyk avatar Feb 26 '24 16:02 lidingsnyk

@fxmarty Thanks a lot for the fix! We are looking forward to it! :tada:

BBerabi avatar Feb 27 '24 12:02 BBerabi