optimum
optimum copied to clipboard
Different Result between optimum and onnx
System Info
torch==1.10.1+cu111
onnx==1.11.0
onnxruntime==1.11.1
optimum==1.2.3
Python 3.8.10
OS: Ubuntu 20.04
Who can help?
@echarlaix @JingyaHuang
Information
- [ ] The official example scripts
- [X] My own modified scripts
Tasks
- [ ] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [X] My own task or dataset (give details below)
Reproduction
First I created optimum model
from optimum.onnxruntime.configuration import AutoQuantizationConfig
from optimum.onnxruntime import ORTQuantizer
qconfig = AutoQuantizationConfig.arm64(is_static=False, per_channel=False)
quantizer = ORTQuantizer.from_pretrained("model", feature="sequence-classification")
quantizer.export(
onnx_model_path="model.onnx",
onnx_quantized_model_output_path="model-quantized.onnx",
quantization_config=qconfig,
)
ort_model = ORTModel("model.onnx", quantizer._onnx_config)
# Create a dataset or load one from the Hub
ds = Dataset.from_dict({"sentence": list(val_df.tweet)})
# Tokenize the inputs
def preprocess_fn(ex, tokenizer):
return tokenizer(ex["sentence"],padding='max_length',truncation=True, )
tokenized_ds = ds.map(partial(preprocess_fn, tokenizer=quantizer.tokenizer,))
ort_outputs = ort_model.evaluation_loop(tokenized_ds)
pred=ort_outputs.predictions
This is how i created onnx model
!python -m transformers.onnx --model=model --feature=sequence-classification onnx/
model_name="distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(model_name)
session = InferenceSession("onnx/model.onnx")
tokens=tokenizer(list(val_df.tweet),padding='max_length',truncation=True, return_tensors="np")
outputs = session.run(output_names=["logits"], input_feed=dict(tokens))
The accuracy of optimum model is 4% less than onnx model and accuracy of onnx model is same as pytorch model.
Expected behavior
The accuracy of optimum model is 4% less than onnx model and accuracy of onnx model is same as pytorch model. The accuracy of optimum model should be closed to onnx model/pyotrch model
Hello, I could not reproduce the issue with
torch==1.11.0+cu113
onnx==1.11.0
onnxruntime==1.11.1
optimum==main
Python 3.9.12
Could you try the following self-contained script?
Run python -m transformers.onnx output_dir -m transformers.onnx --model distilbert-base-uncased-finetuned-sst-2-english --feature=sequence-classification and (fixing the paths for the models):
from optimum.onnxruntime.configuration import AutoQuantizationConfig
from optimum.onnxruntime import ORTQuantizer
from optimum.onnxruntime import ORTModel
from datasets import load_dataset
from transformers import EvalPrediction
import numpy as np
from functools import partial
from onnxruntime import InferenceSession
from transformers import AutoTokenizer
model_name = "distilbert-base-uncased-finetuned-sst-2-english"
optimum_model_path = "path/to/optimum_model.onnx"
onnx_model_path = "path/to/model.onnx"
def compute_metrics(p: EvalPrediction):
preds = p.predictions[0] if isinstance(p.predictions, tuple) else p.predictions
preds = np.argmax(preds, axis=1)
return {"accuracy": (preds == p.label_ids).astype(np.float32).mean().item()}
# Tokenize the inputs
def preprocess_fn(ex, tokenizer):
return tokenizer(ex["sentence"], padding='max_length', truncation=True)
ds = load_dataset("glue", "sst2")
ds = ds["validation"]
ds = ds.select(range(50))
quantizer = ORTQuantizer.from_pretrained(model_name, feature="sequence-classification")
tokenized_ds = ds.map(partial(preprocess_fn, tokenizer=quantizer.tokenizer))
# Inference with Optimum
# may as well use AutoQuantizationConfig.avx2() here
qconfig = AutoQuantizationConfig.arm64(is_static=False, per_channel=False)
quantizer.export(
onnx_model_path=optimum_model_path,
onnx_quantized_model_output_path="/tmp/unused.onnx",
quantization_config=qconfig,
)
ort_model_eval = ORTModel(
optimum_model_path,
quantizer._onnx_config,
compute_metrics=compute_metrics,
label_names=["label"],
)
outputs = ort_model_eval.evaluation_loop(tokenized_ds)
print("Accuracy using Optimum:", outputs.metrics)
# Inference with vanilla InferenceSession
tokenizer = AutoTokenizer.from_pretrained(model_name)
session = InferenceSession(onnx_model_path)
tokens = tokenizer(list(ds["sentence"]), padding='max_length',truncation=True, return_tensors="np")
outputs = session.run(output_names=["logits"], input_feed=dict(tokens))
metrics_vanilla = compute_metrics(EvalPrediction(predictions=outputs[0], label_ids=np.array(ds["label"])))
print("Accuracy using vanilla ONNX Runtime:", metrics_vanilla)
i am trying to produce a reproduceable script but the weird thing is that in colab, i got same result with onnx and optimum, but in local system i did not. though the script is exactly the same
The issue seems to have been solved, closing it. Feel free to re-open if there are further questions.