optimum Different Result between optimum and onnx

System Info

torch==1.10.1+cu111
onnx==1.11.0
onnxruntime==1.11.1
optimum==1.2.3
Python 3.8.10
OS: Ubuntu 20.04

Who can help?

@echarlaix @JingyaHuang

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[X] My own task or dataset (give details below)

Reproduction

First I created optimum model

from optimum.onnxruntime.configuration import AutoQuantizationConfig
from optimum.onnxruntime import ORTQuantizer
qconfig = AutoQuantizationConfig.arm64(is_static=False, per_channel=False)
quantizer = ORTQuantizer.from_pretrained("model", feature="sequence-classification")

quantizer.export(
    onnx_model_path="model.onnx",
    onnx_quantized_model_output_path="model-quantized.onnx",
    quantization_config=qconfig,
)


ort_model = ORTModel("model.onnx", quantizer._onnx_config)

# Create a dataset or load one from the Hub
ds = Dataset.from_dict({"sentence": list(val_df.tweet)})
# Tokenize the inputs
def preprocess_fn(ex, tokenizer):
    return tokenizer(ex["sentence"],padding='max_length',truncation=True, )
tokenized_ds = ds.map(partial(preprocess_fn, tokenizer=quantizer.tokenizer,))
ort_outputs = ort_model.evaluation_loop(tokenized_ds)
pred=ort_outputs.predictions

This is how i created onnx model

!python -m transformers.onnx --model=model --feature=sequence-classification onnx/
model_name="distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(model_name)
session = InferenceSession("onnx/model.onnx")
tokens=tokenizer(list(val_df.tweet),padding='max_length',truncation=True, return_tensors="np")
outputs = session.run(output_names=["logits"], input_feed=dict(tokens))

The accuracy of optimum model is 4% less than onnx model and accuracy of onnx model is same as pytorch model.

Expected behavior

The accuracy of optimum model is 4% less than onnx model and accuracy of onnx model is same as pytorch model. The accuracy of optimum model should be closed to onnx model/pyotrch model

Jun 15 '22 22:06 talhaanwarch

Hello, I could not reproduce the issue with

torch==1.11.0+cu113
onnx==1.11.0
onnxruntime==1.11.1
optimum==main
Python 3.9.12

Could you try the following self-contained script?

Run python -m transformers.onnx output_dir -m transformers.onnx --model distilbert-base-uncased-finetuned-sst-2-english --feature=sequence-classification and (fixing the paths for the models):

from optimum.onnxruntime.configuration import AutoQuantizationConfig
from optimum.onnxruntime import ORTQuantizer
from optimum.onnxruntime import ORTModel

from datasets import load_dataset

from transformers import EvalPrediction

import numpy as np
from functools import partial

from onnxruntime import InferenceSession
from transformers import AutoTokenizer



model_name = "distilbert-base-uncased-finetuned-sst-2-english"
optimum_model_path = "path/to/optimum_model.onnx"
onnx_model_path = "path/to/model.onnx"


def compute_metrics(p: EvalPrediction):
    preds = p.predictions[0] if isinstance(p.predictions, tuple) else p.predictions
    preds =  np.argmax(preds, axis=1)
    return {"accuracy": (preds == p.label_ids).astype(np.float32).mean().item()}


# Tokenize the inputs
def preprocess_fn(ex, tokenizer):
    return tokenizer(ex["sentence"], padding='max_length', truncation=True)


ds = load_dataset("glue", "sst2")
ds = ds["validation"]
ds = ds.select(range(50))

quantizer = ORTQuantizer.from_pretrained(model_name, feature="sequence-classification")

tokenized_ds = ds.map(partial(preprocess_fn, tokenizer=quantizer.tokenizer))

# Inference with Optimum
# may as well use AutoQuantizationConfig.avx2() here
qconfig = AutoQuantizationConfig.arm64(is_static=False, per_channel=False)

quantizer.export(
    onnx_model_path=optimum_model_path,
    onnx_quantized_model_output_path="/tmp/unused.onnx",
    quantization_config=qconfig,
)

ort_model_eval = ORTModel(
    optimum_model_path,
    quantizer._onnx_config,
    compute_metrics=compute_metrics,
    label_names=["label"],
)
outputs = ort_model_eval.evaluation_loop(tokenized_ds)

print("Accuracy using Optimum:", outputs.metrics)


# Inference with vanilla InferenceSession
tokenizer = AutoTokenizer.from_pretrained(model_name)
session = InferenceSession(onnx_model_path)
tokens = tokenizer(list(ds["sentence"]), padding='max_length',truncation=True, return_tensors="np")
outputs = session.run(output_names=["logits"], input_feed=dict(tokens))

metrics_vanilla = compute_metrics(EvalPrediction(predictions=outputs[0], label_ids=np.array(ds["label"])))

print("Accuracy using vanilla ONNX Runtime:", metrics_vanilla)

Jun 16 '22 15:06 fxmarty

i am trying to produce a reproduceable script but the weird thing is that in colab, i got same result with onnx and optimum, but in local system i did not. though the script is exactly the same

Jun 16 '22 16:06 talhaanwarch

The issue seems to have been solved, closing it. Feel free to re-open if there are further questions.

Dec 16 '22 13:12 JingyaHuang

optimum optimum copied to clipboard

Different Result between optimum and onnx

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

optimum
optimum copied to clipboard