onnx-coreml icon indicating copy to clipboard operation
onnx-coreml copied to clipboard

Different outputs between pytorch, onnx and coreml

Open marc-jv opened this issue 4 years ago • 1 comments

🐞Describe the bug

I'm training a neural network containing an embedding layer, a linear layer, a GRU, then followed with another linear and a logsoftmax. It's trained properly and exported as well to onnx and coreml. However, when I test (that is, computing a forward step with a dummy input) the three models (.pth, .onnx and .mlmodel), the pytorch and onnx models yield the same output but the coreml doesn't.

Trace

This is what happens when I compare the pytorch output vs the coreml output:

AssertionError: 
Not equal to tolerance rtol=0.001, atol=1e-05

Mismatched elements: 98 / 100 (98%)
Max absolute difference: 0.53596115
Max relative difference: 0.1372321
 x: array([[-5.277452, -7.18906 , -3.870493, -5.237523, -5.697977, -4.906397,
        -5.19996 , -5.691385, -5.81255 , -5.201305, -5.705749, -3.767732,
        -5.457488, -6.338089, -6.073925, -5.906573, -5.663018, -6.57819 ,...
 y: array([[-5.169696, -6.974405, -3.716341, -5.348598, -6.07383 , -4.743478,
        -5.518433, -5.430327, -5.695125, -5.232084, -5.561243, -3.704498,
        -5.26874 , -6.728903, -6.277035, -5.768873, -5.754221, -6.781257,...

Unfortunately I can't attach the models since they belong to my company.

System environment (please complete the following information):

  • coremltools version: 3.4
  • onnx-coreml version: 1.3
  • OS: MacOS
  • macOS version (if applicable): 10.15.4
  • How you install python: virtualenv
  • python version: 3.7
  • any other relevant information:

Additional context

I deleted the GRU and adapted a bit the network and everything works out, so I guess there's an issue with it. The network receives as input also the initial hidden state. Also, the batch_first flag is set to false.

UPDATE

Ok so I simplified (a lot) the network and came up with the following script, which has the same issue I described above:

import torch as th
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
import onnx
import onnxruntime
from onnx_coreml import convert


class NeuralNet(nn.Module):
    def __init__(self, input_size, output_size, embeddings_dim, hidden_layer):
        super(NeuralNet, self).__init__()

        self.embeddings = nn.Embedding(input_size, embeddings_dim)

        self.gru1 = nn.GRU(embeddings_dim, hidden_layer)

        self.fc_out = nn.Linear(hidden_layer, output_size)

    def forward(self, inputs, hidden):
        embedded = self.embeddings(inputs)

        outputs, hidden = self.gru1(embedded.transpose(1, 0), hidden)

        logits = self.fc_out(outputs.transpose(1, 0)[:, -1, :])

        log_outputs = F.log_softmax(logits, dim=-1)

        return log_outputs


def to_numpy(tensor, dtype=None):
    np_tensor = tensor.detach().cpu().numpy() if tensor.requires_grad else tensor.cpu().numpy()

    if dtype:
        return np_tensor.astype(dtype)
    return np_tensor


def export_to_onnx(net, x, hidden, torch_out):
    onnx_file_path = "net.onnx"

    # Export the model
    th.onnx.export(net,  # model being run
                   (x, hidden),  # model input (or a tuple for multiple inputs)
                   onnx_file_path,  # where to save the model (can be a file or file-like object)
                   input_names=['input', 'hidden'],  # the model's input names
                   output_names=['output']  # the model's output names
                   )
    onnx_model = onnx.load(onnx_file_path)
    onnx.checker.check_model(onnx_model)
    ort_session = onnxruntime.InferenceSession(onnx_file_path)

    # compute ONNX Runtime output prediction

    ort_inputs = {ort_session.get_inputs()[0].name: to_numpy(x), ort_session.get_inputs()[1].name: to_numpy(hidden)}
    ort_outs = ort_session.run(None, ort_inputs)
    # compare ONNX Runtime and PyTorch results
    np.testing.assert_allclose(to_numpy(torch_out), ort_outs[0], rtol=1e-03, atol=1e-05)
    print("Exported model has been tested with ONNXRuntime, and the result looks good!")


def export_to_coreml(x, hidden, torch_out):
    onnx_file_path = "net.onnx"
    coreml_file_path = "net.mlmodel"

    model = convert(model=onnx_file_path, minimum_ios_deployment_target='13')
    model.save(coreml_file_path)

    # compute CoreML output prediction
    output = model.predict({"input": to_numpy(x, np.float32), "hidden": to_numpy(hidden)})[
        'output']
    # compare CoreML and PyTorch results
    np.testing.assert_allclose(to_numpy(torch_out), output, rtol=1e-03, atol=1e-05)
    print("Exported model has been tested with CoreML, and the result looks good!")


if __name__ == "__main__":
    input_size = 60
    output_size = 100
    embeddings_dim = 10
    hidden_layers = 8

    net = NeuralNet(input_size, output_size, embeddings_dim, hidden_layers)

    x = th.zeros((1, 60), dtype=th.long)
    hidden = th.zeros(1, 1, hidden_layers)

    torch_out = net.forward(x, hidden)

    export_to_onnx(net, x, hidden, torch_out)
    export_to_coreml(x, hidden, torch_out)

Regarding this line:

output = model.predict({"input": to_numpy(x, np.float32), "hidden": to_numpy(hidden)})[
        'output']

I'm not sure if it makes sense to use an input of type Float, since the first layer is and embedding layer, but couldn't figure out the way to use a Long input in coreml.

UPDATE 2

As I said, if you remove the GRU everything goes well, that is, changing the network to:

class NeuralNet(nn.Module):
    def __init__(self, input_size, output_size, embeddings_dim, hidden_layer):
        super(NeuralNet, self).__init__()

        self.embeddings = nn.Embedding(input_size, embeddings_dim)

        #self.gru1 = nn.GRU(embeddings_dim, hidden_layer)

        self.fc_out = nn.Linear(embeddings_dim * input_size, output_size)

    def forward(self, inputs, hidden):
        embedded = self.embeddings(inputs)

        #outputs, hidden = self.gru1(embedded.transpose(1, 0), hidden)

        logits = self.fc_out(embedded.view(1, -1))

        log_outputs = F.log_softmax(logits, dim=-1)

        return log_outputs

marc-jv avatar Jun 16 '20 13:06 marc-jv

Did you have time to look into this @aseemw ? Thanks

francescojalvo avatar Jun 23 '20 10:06 francescojalvo