MNN icon indicating copy to clipboard operation
MNN copied to clipboard

Model Output Disparity

Open kushalpatil07 opened this issue 1 year ago • 8 comments

Model Output Disparity for MNN model converted from ONNX

Converted model mosaic.mnn from mosaic.onnx using the following command

mnnconvert -f ONNX --modelFile mosaic.onnx --MNNModel mosaic.mnn 

Test script->

import MNN

mnn_model = 'mosaic.mnn'
net = MNN.nn.load_module_from_file(mnn_model,[],[]);
input=MNN.numpy.ones([1,3,224,224],dtype=MNN.numpy.float32)
input=MNN.expr.convert(input,MNN.expr.NC4HW4)
output=net.forward(input)
output = MNN.expr.convert(output,MNN.expr.NCHW)
print("######################## MNN Output##############################")
print(output)
print('#################################################################')
print("")

import onnxruntime as ort
import numpy as np

onnx_model='mosaic.onnx'

model= ort.InferenceSession('mosaic.onnx')
output=model.run(None,{'input1':np.ones([1,3,224,224],dtype=np.float32)})
print("######################## ONNX Output##############################")
print(output)
print('#################################################################')

The outputs of both the models differ on ones as input.

Test Setup MNN version==2.8.1 onnxruntime version==1.17.0 x86_64 architecture Linux machine python3.10.12

Could someone help me with why these outputs are differing? I am trying to shift from onnx to mnn for the performance as well binary size benefits.

Attaching both the models as well as above script as zip for reference mosaic.zip

kushalpatil07 avatar Mar 26 '24 13:03 kushalpatil07

Use testMNNFromOnnx.py to test firstly.

jxt1234 avatar Mar 27 '24 06:03 jxt1234

I have ran testMNNFromOnnx.py. The test passes when given random input np.random.uniform(0, 12, shapes). When I edited the script to take np.zeros(shapes) as input, the test fails. On a side note, this mosaic model is a standard model taken from https://github.com/onnx/models/tree/main/validated/vision/style_transfer/fast_neural_style

kushalpatil07 avatar Mar 27 '24 07:03 kushalpatil07

hey @jxt1234 is someone looking into this? I would really appreciate the help :)

kushalpatil07 avatar Apr 01 '24 05:04 kushalpatil07

testMNNFromOnnx.py TEST_SUCCESS means MNN is right. You can convert MNN with --keepInputFormat and don't call MNN.expr.convert(input,MNN.expr.NC4HW4).

jxt1234 avatar Apr 02 '24 13:04 jxt1234

testMNNFromOnnx.py TEST_SUCCESS means MNN is right. I feel we are having a misunderstanding, What do you mean by this exactly?

I am giving you an example mosaic.onnx and mosaic.mnn with a particular input not matching. This does not work even after incorporating your suggestions You can convert MNN with --keepInputFormat and don't call MNN.expr.convert(input,MNN.expr.NC4HW4).

Steps to reproduce the issue:

  1. This script testMNNFromOnnx.py takes random input into the model, and that passes the test and gives TEST_SUCCESS.
  2. When we edit the script to take np.zeros(shape) as input the test itself fails.

@jxt1234 could you try and reproduce this issue using the two steps mentioned above. Attaching the edited testMNNFromOnnx.py for easy reproducibility. Added a comment on line 153 # EDITED HERE so that the model takes zeros as input where I have edited the script to take zeros as input.

test_onnx.zip

kushalpatil07 avatar Apr 03 '24 21:04 kushalpatil07

For your case np.zeros / np.ones will cause input data is all same. So var is very small. It will cause sqrt function Disparity. It's not common case. Please use real input data to test.

jxt1234 avatar Apr 05 '24 04:04 jxt1234

There is also another disparity we are observing. @jxt1234 The testScript shared is giving very different results on Mac and Linux machines on np.ones and np.zeros.

Test Setup of Linux Machine MNN version==2.8.1 onnxruntime version==1.17.0 x86_64 architecture Linux machine python3.10.12

Test Setup of Mac Machine MNN version==2.8.1 onnxruntime version==1.17.0 Mac M1 chip (2020) python3.7.16

Output on Linux machine

######################## MNN Output##############################
array([[[[ 1.7789851e+02,  1.7789816e+02,  1.7789783e+02, ...,
           1.7790385e+02,  1.7790039e+02,  1.7790286e+02],
         [ 1.7789778e+02,  1.7789758e+02,  1.7789714e+02, ...,
           1.7790488e+02,  1.7790128e+02,  1.7790410e+02],
         [ 1.7789806e+02,  1.7789786e+02,  1.7789740e+02, ...,
           1.7790530e+02,  1.7790167e+02,  1.7790437e+02],
         ...,
         [ 1.7787030e+02,  1.7786987e+02,  1.7786964e+02, ...,
          -1.8242943e+01, -3.4454234e+00,  9.5861053e+00],
         [ 1.7786783e+02,  1.7786771e+02,  1.7786725e+02, ...,
          -3.3775089e+01, -2.9153913e+01, -3.9820961e+01],
         [ 1.7787071e+02,  1.7787071e+02,  1.7787039e+02, ...,
          -7.8822029e+01, -7.9405579e+01, -9.0173950e+01]],

        [[ 1.6287909e+02,  1.6287906e+02,  1.6287897e+02, ...,
           1.6287405e+02,  1.6287401e+02,  1.6287613e+02],
         [ 1.6287944e+02,  1.6287935e+02,  1.6287930e+02, ...,
           1.6287408e+02,  1.6287430e+02,  1.6287608e+02],
         [ 1.6287997e+02,  1.6288005e+02,  1.6287979e+02, ...,
           1.6287399e+02,  1.6287418e+02,  1.6287605e+02],
         ...,
         [ 1.6285625e+02,  1.6285625e+02,  1.6285588e+02, ...,
          -4.3152927e+01, -2.9389851e+01, -3.3526914e+00],
         [ 1.6285176e+02,  1.6285170e+02,  1.6285165e+02, ...,
           2.2175953e+01,  2.8807076e+01,  3.5963924e+01],
         [ 1.6285019e+02,  1.6284982e+02,  1.6284982e+02, ...,
           2.5776017e+00, -6.1696470e-03,  5.3866558e+00]],

        [[ 1.5353537e+02,  1.5353502e+02,  1.5353499e+02, ...,
           1.5352097e+02,  1.5351585e+02,  1.5351753e+02],
         [ 1.5353632e+02,  1.5353571e+02,  1.5353595e+02, ...,
           1.5352025e+02,  1.5351535e+02,  1.5351712e+02],
         [ 1.5353517e+02,  1.5353476e+02,  1.5353500e+02, ...,
           1.5352109e+02,  1.5351575e+02,  1.5351735e+02],
         ...,
         [ 1.5354117e+02,  1.5354056e+02,  1.5354077e+02, ...,
           2.3128716e+01,  4.9280975e+01,  6.8965607e+01],
         [ 1.5353847e+02,  1.5353783e+02,  1.5353815e+02, ...,
           7.4308746e+01,  8.9900993e+01,  8.4358002e+01],
         [ 1.5353415e+02,  1.5353368e+02,  1.5353375e+02, ...,
           2.3612188e+01,  2.1621935e+01,  1.5537018e+01]]]],
      dtype=float32)
#################################################################

######################## ONNX Output##############################
[array([[[[171.34421, 171.34421, 171.34421, ..., 171.34421, 171.34421,
          171.34421],
         [171.34421, 171.34421, 171.34421, ..., 171.34421, 171.34421,
          171.34421],
         [171.34421, 171.34421, 171.34421, ..., 171.34421, 171.34421,
          171.34421],
         ...,
         [171.34421, 171.34421, 171.34421, ..., 171.34421, 171.34421,
          171.34421],
         [171.34421, 171.34421, 171.34421, ..., 171.34421, 171.34421,
          171.34421],
         [171.34421, 171.34421, 171.34421, ..., 171.34421, 171.34421,
          171.34421]],

        [[157.65192, 157.65192, 157.65192, ..., 157.65192, 157.65192,
          157.65192],
         [157.65192, 157.65192, 157.65192, ..., 157.65192, 157.65192,
          157.65192],
         [157.65192, 157.65192, 157.65192, ..., 157.65192, 157.65192,
          157.65192],
         ...,
         [157.65192, 157.65192, 157.65192, ..., 157.65192, 157.65192,
          157.65192],
         [157.65192, 157.65192, 157.65192, ..., 157.65192, 157.65192,
          157.65192],
         [157.65192, 157.65192, 157.65192, ..., 157.65192, 157.65192,
          157.65192]],

        [[147.90988, 147.90988, 147.90988, ..., 147.90988, 147.90988,
          147.90988],
         [147.90988, 147.90988, 147.90988, ..., 147.90988, 147.90988,
          147.90988],
         [147.90988, 147.90988, 147.90988, ..., 147.90988, 147.90988,
          147.90988],
         ...,
         [147.90988, 147.90988, 147.90988, ..., 147.90988, 147.90988,
          147.90988],
         [147.90988, 147.90988, 147.90988, ..., 147.90988, 147.90988,
          147.90988],
         [147.90988, 147.90988, 147.90988, ..., 147.90988, 147.90988,
          147.90988]]]], dtype=float32)]
#################################################################

Output on Mac Machine

######################## MNN Output##############################
array([[[[250.21507 , 250.16147 , 249.76387 , ...,  58.233986,
           59.66561 ,  51.944107],
         [249.54218 , 249.3817  , 248.97392 , ...,  58.83722 ,
           64.62019 ,  58.525745],
         [249.11377 , 249.11868 , 248.49596 , ...,  53.54505 ,
           62.351784,  56.883274],
         ...,
         [ 90.75204 ,  85.10422 ,  77.92067 , ...,  58.6197  ,
           88.46058 ,  74.92694 ],
         [139.21458 , 137.08766 , 132.7679  , ..., 123.57399 ,
          149.02847 , 110.15517 ],
         [135.13843 , 134.24225 , 134.49554 , ..., 114.151   ,
          144.84058 , 116.4156  ]],

        [[250.7712  , 251.45166 , 251.51967 , ..., 116.97559 ,
          107.31926 , 111.61564 ],
         [249.24142 , 249.67822 , 249.77678 , ..., 116.61871 ,
          110.15776 , 116.610756],
         [247.57831 , 248.08623 , 247.88712 , ..., 116.43434 ,
          113.84082 , 120.039795],
         ...,
         [129.85591 , 127.12337 , 119.98896 , ...,  83.45971 ,
          100.48431 ,  93.10752 ],
         [130.522   , 124.07818 , 122.81633 , ...,  79.169044,
           96.53803 ,  79.26343 ],
         [124.85134 , 121.22441 , 123.81856 , ...,  93.23835 ,
          112.23832 , 103.228325]],

        [[303.78763 , 303.88348 , 303.10953 , ..., 171.72421 ,
          168.28552 , 164.99461 ],
         [301.4203  , 301.25974 , 300.55664 , ..., 172.78957 ,
          172.58192 , 170.90668 ],
         [298.97623 , 298.9938  , 298.28656 , ..., 169.06041 ,
          172.94447 , 174.88399 ],
         ...,
         [120.82464 , 112.478584, 100.644356, ...,  90.14767 ,
          119.52492 , 128.43115 ],
         [144.41287 , 137.94812 , 133.33185 , ..., 128.82146 ,
          163.61607 , 151.649   ],
         [153.434   , 149.53357 , 152.06706 , ..., 156.95384 ,
          189.02585 , 174.62895 ]]]], dtype=float32)
#################################################################

######################## ONNX Output##############################
[array([[[[170.95201, 170.95201, 170.95201, ..., 170.95201, 170.95201,
          170.95201],
         [170.95201, 170.95201, 170.95201, ..., 170.95201, 170.95201,
          170.95201],
         [170.95201, 170.95201, 170.95201, ..., 170.95201, 170.95201,
          170.95201],
         ...,
         [170.95201, 170.95201, 170.95201, ..., 170.95201, 170.95201,
          170.95201],
         [170.95201, 170.95201, 170.95201, ..., 170.95201, 170.95201,
          170.95201],
         [170.95201, 170.95201, 170.95201, ..., 170.95201, 170.95201,
          170.95201]],

        [[157.96027, 157.96027, 157.96027, ..., 157.96027, 157.96027,
          157.96027],
         [157.96027, 157.96027, 157.96027, ..., 157.96027, 157.96027,
          157.96027],
         [157.96027, 157.96027, 157.96027, ..., 157.96027, 157.96027,
          157.96027],
         ...,
         [157.96027, 157.96027, 157.96027, ..., 157.96027, 157.96027,
          157.96027],
         [157.96027, 157.96027, 157.96027, ..., 157.96027, 157.96027,
          157.96027],
         [157.96027, 157.96027, 157.96027, ..., 157.96027, 157.96027,
          157.96027]],

        [[151.1252 , 151.1252 , 151.1252 , ..., 151.1252 , 151.1252 ,
          151.1252 ],
         [151.1252 , 151.1252 , 151.1252 , ..., 151.1252 , 151.1252 ,
          151.1252 ],
         [151.1252 , 151.1252 , 151.1252 , ..., 151.1252 , 151.1252 ,
          151.1252 ],
         ...,
         [151.1252 , 151.1252 , 151.1252 , ..., 151.1252 , 151.1252 ,
          151.1252 ],
         [151.1252 , 151.1252 , 151.1252 , ..., 151.1252 , 151.1252 ,
          151.1252 ],
         [151.1252 , 151.1252 , 151.1252 , ..., 151.1252 , 151.1252 ,
          151.1252 ]]]], dtype=float32)]
#################################################################

The difference in outputs of the onnx models between two devices could be attributed to the sqrt function disparity. But the difference in the result of MNN models is huge, do we attribute this large change to the sqrt function?

Is the implementation of MNN runtime so different for Arm and x86 that it is expected to see such behaviour? Because the numbers being so different feels like that I am not using it correctly, or there is a bug? Attaching the testScript again for reference

import MNN

mnn_model = 'mosaic.mnn'
net = MNN.nn.load_module_from_file(mnn_model,[],[]);
input=MNN.numpy.zeros([1,3,224,224],dtype=MNN.numpy.float32)
#input=MNN.expr.convert(input,MNN.expr.NC4HW4)
output=net.forward(input)
output = MNN.expr.convert(output,MNN.expr.NCHW)
print("######################## MNN Output##############################")
print(output)
print('#################################################################')
print("")

import onnxruntime as ort
import numpy as np

onnx_model='mosaic.onnx'

model= ort.InferenceSession('mosaic.onnx')
output=model.run(None,{'input1':np.zeros([1,3,224,224],dtype=np.float32)})
print("######################## ONNX Output##############################")
print(output)
print('#################################################################')

kushalpatil07 avatar Apr 05 '24 10:04 kushalpatil07

np.zeros() and np.ones() isn't valid input for this model because of instancenorm. It will result to compute sqrt(0.0 + 0.000000000001) , thus cause [Disparity]

jxt1234 avatar Apr 08 '24 08:04 jxt1234

Marking as stale. No activity in 60 days.

github-actions[bot] avatar Jun 07 '24 09:06 github-actions[bot]