coremltools Wrong output from PyTorch converted model

🐞Describe the bug

The output of the PyTorch and CoreML models have a big mismatch even given a high tolerance.

To Reproduce

import numpy as np
import torch
from transformers import AutoTokenizer, AutoModel
import coremltools as ct

sentences = ["This is a test."]
tokenizer = AutoTokenizer.from_pretrained('sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2')
model = AutoModel.from_pretrained('sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2', torchscript=True).eval()
encoded_input = tokenizer(sentences, return_tensors='pt')

traced_model = torch.jit.trace(model, tuple(encoded_input.values()))
scripted_model = torch.jit.script(traced_model)

model = ct.convert(scripted_model, source="pytorch",
                   inputs=[ct.TensorType(name="input_ids",      shape=(ct.RangeDim(), ct.RangeDim()),      dtype=np.int32),
                           ct.TensorType(name="token_type_ids", shape=(ct.RangeDim(), ct.RangeDim()), dtype=np.int32),
                           ct.TensorType(name="attention_mask", shape=(ct.RangeDim(), ct.RangeDim()), dtype=np.int32)],
                   convert_to="neuralnetwork", compute_units=ct.ComputeUnit.CPU_ONLY)
with torch.no_grad():
    pt_out = scripted_model(**encoded_input)
cml_inputs = {k: v.to(torch.int32).numpy() for k, v in encoded_input.items()}
pred_coreml = model.predict(cml_inputs)
np.testing.assert_allclose(pt_out[0].detach().numpy(), pred_coreml["hidden_states"], atol=1e-5, rtol=1e-4)

System environment:

coremltools version: 5.2.0
OS (e.g. MacOS version or Linux type): MacOS M1 12.3.1
Any other relevant version information:
- e.g. PyTorch or TensorFlow version: PyTorch 1.11.0 and Transformers 4.18.0
- Python version: 3.8.13 (conda)

May 09 '22 12:05 jplu

Any updates about this? I also noticed that the outputs from the PyTorch model and the converted CoreML model are not exactly the same.

Jan 10 '23 16:01 zhiqiangw

This is also an issue when converting to mlprogram.

Specifying the input sizes does not help either, i.e. converting with the following does not help:

model = ct.convert(
   scripted_model,
   source="pytorch",
   inputs=[
      ct.TensorType(name="input_ids",      shape=encoded_input["input_ids"].shape,      dtype=np.int32),
      ct.TensorType(name="token_type_ids", shape=encoded_input["token_type_ids"].shape, dtype=np.int32),
      ct.TensorType(name="attention_mask", shape=encoded_input["attention_mask"].shape, dtype=np.int32)
   ],
   convert_to="mlprogram",
   compute_units=ct.ComputeUnit.CPU_ONLY
)

Jan 10 '23 23:01 TobyRoseman

Hi, I also meet a similar issue. Any updates about this?

Mar 21 '23 08:03 yqrickw20

An absolute tolerance of 1e-5 and relative tolerance of 1e-4 is too low for a model of this size. The output of the Core ML model and PyTorch model are quite close.

The outputs match even more closely if you convert traced_model rather than scripted_model.

Feb 19 '24 23:02 TobyRoseman