machinelearning icon indicating copy to clipboard operation
machinelearning copied to clipboard

Validation of Onnx input using ApplyOnnxModel

Open DeveloperNo579212 opened this issue 3 years ago • 10 comments

  • Windows 10 build (19044.1469) Latest of these as per (2022-02-01):
  • Microsoft.ML.NET Version: 1.7
  • Microsoft.ML.ImageAnalytics: 1.7.0
  • Microsoft.OnnxRuntime: 1.10
  • Microsoft.OnnxRuntime.GPU: 1.10
  • Microsoft.OnnxTransformer: 1.7.0
  • .NET Framework 4.8

I created a very simple ImagePrediction model that I converted from savedmodel to onnx using tf2onnx. I have verified the conversion is ok using python.

Using the onnx file in ML.Net does not verify ok but does fail in classification, I get a high prediction score in the image itself but it assigned to wrong column (class).

The image is normalized between (1 ,-1) in python and in my ML.net implementation. Removing the normalization in python gives same sort of problem as shows with normalization in ML.net. Imageformat in use is PNG.

Model is very simple with two classes of 10 images each and gives 100% hit using a training image in validation when training, the conversion from savedmodel to onnx is confirmed ok and predictable even if not as good as in training.

I need some reference implementation with image classification in ML.net with Onnx model as input as my last step in verification.

I used a slightly modifed version of this (section Verifying a Converted Model) adding normalization of input image: https://onnxruntime.ai/docs/tutorials/tf-get-started.html

I want to verify my onnx model in ML.net to be identical as in the output from python as above.

I used opset 11 as noted in (https://docs.microsoft.com/en-us/dotnet/api/microsoft.ml.transforms.onnx.onnxscoringestimator?view=ml-dotnet)

DeveloperNo579212 avatar Feb 01 '22 07:02 DeveloperNo579212

Can you provide a repro project so we can take a look?

My guess is it has to do with the type of normalization. Its possible that the normalization that python is doing is different than what we are doing. A simple repro project would help us verify if this is the issue or if its something else.

michaelgsharp avatar Feb 01 '22 21:02 michaelgsharp

I added some example source code here (not intended to compile directly), the pipeline is setup and I tried out different custommapping implementations. This is intended to be equivalent to the python implementation that seems to work ok: (I checked this in .NET Framework 4.8 and .NET 5 implementations, added my python code as attachment I run my python with CUDA 11.6 ok) .

static string IMAGE_FILE = "class2.png";     // change this to test the prediction classification
static string ONNX_MODEL_PATH = "model2.onnx";  // tested ok in python
static string IMAGELABEL = "customized_image";

 public class NormalizeInput
    {
        [ColumnName("customized_image")]
        [VectorType(1, 224, 224, 3)]
        public VBuffer<float> Reshape;
    }
    public class NormalizeOutput
    {
        [ColumnName("sequential_5_input")]
        [VectorType(3 * 224 * 224)]}
        public VBuffer<float> Reshape;
    }

    public class ImageInputData
    {
        [ImageType(224, 224)]
        [ColumnName("sequential_5_input")]
        public Bitmap ImageData { get; set; }
    }
    public class OnnxOutput
    {
        [ColumnName("sequential_7")]
        public float[] OutputNode { get; set; }
    }

Action<NormalizeInput, NormalizeOutput> mapping = (input, output) =>
            {
                var values = input.Reshape.GetValues().ToArray();
               for (int x = 0; x < values.Count(); x++)
               {
                    values[x] = (values[x] / 127f) - 1;
                };
                output.Reshape = new VBuffer<float>(values.Count(), values);
            };

            var onnxOptions = new OnnxOptions()
            {
                ModelFile = ONNX_MODEL_PATH,
                InputColumns = new string[] { "sequential_5_input"},
                OutputColumns = new string[] { "sequential_7"},
            };

            InitializeComponent();
            MLContext mlContext = new MLContext();

            var pipeline =
                mlContext.Transforms.ResizeImages(
                outputColumnName: IMAGELABEL,
                224,
                224,
                inputColumnName: "sequential_5_input"
                )
            .Append(
                mlContext.Transforms.ExtractPixels(
                inputColumnName: IMAGELABEL,
                outputColumnName: IMAGELABEL,
                colorsToExtract: ImagePixelExtractingEstimator.ColorBits.Rgb)
                )
            .Append(
                    mlContext.Transforms.CustomMapping(mapping, contractName: null)
                )
            .Append(
                mlContext.Transforms.ApplyOnnxModel(onnxOptions)
            );

     
           var emptydata = mlContext.Data.LoadFromEnumerable(new List<ImageInputData>() {});            
            var trainedmodel = pipeline.Fit(emptydata);
            // create a prediction engine
            var onnxPredictionEngine = mlContext.Model.CreatePredictionEngine<ImageInputData, OnnxOutput>(trainedmodel);

            var img = Bitmap.FromFile(IMAGE_FILE);
            var image = new ImageInputData() { ImageData = (Bitmap)img};
            var prediction = onnxPredictionEngine.Predict(image);

classificationok.txt I have updated the attachment 2022-02-03

// check the classification in the prediction output

DeveloperNo579212 avatar Feb 02 '22 06:02 DeveloperNo579212

Issue #5946 seems somewhat related.

DeveloperNo579212 avatar Feb 08 '22 14:02 DeveloperNo579212

I just confirmed the potential error is present with the most recent Microsoft.ML.NET Version: 1.7.1 update, including: Microsoft.ML: 1.7.1 Microsoft.OnnxTransformer: 1.7.1 Microsoft.ML.TensorFlow: 1.7.1 Microsoft.ML.ImageAnalytics: 1.7.1 etc. Confirmed 2022-03-10.

DeveloperNo579212 avatar Mar 10 '22 06:03 DeveloperNo579212

I attach the onnx file here, I zipped the onnx file. model2.zip The input is just a lefthand or righthand color image in 224x224 size.

DeveloperNo579212 avatar Mar 15 '22 09:03 DeveloperNo579212

Ok, I figured out the issue (I think anyways). The model input has the first dimension defined as unk__605 which is meaning that the dimension is of unknown size. We do allow the first dimension to be of unknown size, but our code is expecting that to be represented as -1 (or I believe any negative number). How did you convert your model to onnx? I'm trying to figure out where that representation is coming from. I know this works in python, I wonder if the python code is converting that to something the native onnx understands, or if the native onnx understands that format directly.

@luisquintanilla, if this definition is common we will need to modify our loading code so that it treats unk__*** the same way as it does -1.

michaelgsharp avatar Mar 18 '22 20:03 michaelgsharp

Ok, as reference the model converted to onnx using this (also here in Section SavedModel: https://onnxruntime.ai/docs/tutorials/tf-get-started.html) I tried different opset in the conversion (for troubleshooting) with no different outcome /7/11/13/etc python -m tf2onnx.convert --saved-model path/to/savedmodel --output dst/path/model.onnx --opset 11

I have attached this here zipped; model.zip

(The model2.onnx attached earlier was converted and applied this on, might explain the unk_605?, but with no difference in outcome compared to the one attached above) python -m remove_initializer_from_input --input model.onnx --output model2.onnx

Update 2022-03-20: I checked both onnx files in netron.app and both includes unk_***, looking forward on any update to resolve this, a small but groundbreaking change I think!

DeveloperNo579212 avatar Mar 18 '22 21:03 DeveloperNo579212

@DeveloperNo579212 I looked into this more today to go and fix it. Turns out its not the issue that I thought it was sadly... We are handling that input just fine. This means it probably has to do with how the input is being ordered, meaning the input passed into the onnx model. Do you know what the color extraction order is for your model? And when I ran your code it gave me back 2 percentages, do you know which one corresponds to the left hand vs right hand?

michaelgsharp avatar Apr 11 '22 23:04 michaelgsharp

OK the first column "class1" is fingers pointing to the right on my display, image input is 224,224,32 bit PNG in color, (second column "class2" is opposite). I get 90 percent prediction on class 1 and almost 100 percent prediction on class 2 with training images as input. Training set is 10 images of each class, just slightly different input light etc.

I attach OnnxVerify.py that I use to check onnx models. I cross checked the results are same using model.onnx and model2.onnx.

OnnxVerify.zip is here (OnnxVerify.py): OnnxVerify.zip

DeveloperNo579212 avatar Apr 12 '22 05:04 DeveloperNo579212

Just a followup if any more information is needed, without filtering normalization applied I experienced that the output ordering was not changed based on input, with the normalization the classification worked out ok. (Python implementation).

DeveloperNo579212 avatar Jun 07 '22 06:06 DeveloperNo579212

Great, it seems scheduled to ML.NET 3.0 in November 2023. (Just closed this by accident).

DeveloperNo579212 avatar Nov 21 '22 07:11 DeveloperNo579212