openvino [Bug]: convert layout to 'NCHW' (from 'NHWC' specified above at tensor layout) unable to infer normally

OpenVINO Version

2024.1.0-15008-f4afc983258-releases/2024/1

Operating System

Windows System

Device used for inference

CPU

Framework

None

Model used

mobilenet-v3-small-1.0-224-tf

Issue description

The command line debugging parameters I use

./mobilenet-v3-small-1.0-224-tf\FP16\mobilenet-v3-small-1.0-224-tf.xml ./img/dog.bmp CPU

The complete code is as follows:

// Copyright (C) 2018-2024 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//

#include <iterator>
#include <memory>
#include <sstream>
#include <string>
#include <vector>

// clang-format off
#include "openvino/openvino.hpp"

#include "samples/args_helper.hpp"
#include "samples/common.hpp"
#include "samples/classification_results.h"
#include "samples/slog.hpp"
#include "format_reader_ptr.h"
// clang-format on

/**
 * @brief Main with support Unicode paths, wide strings
 */
int tmain(int argc, tchar* argv[]) {
    try {
        // -------- Get OpenVINO runtime version --------
        slog::info << ov::get_openvino_version() << slog::endl;

        // -------- Parsing and validation of input arguments --------
        if (argc != 4) {
            slog::info << "Usage : " << TSTRING2STRING(argv[0]) << " <path_to_model> <path_to_image> <device_name>"
                       << slog::endl;
            return EXIT_FAILURE;
        }

        const std::string args = TSTRING2STRING(argv[0]);
        const std::string model_path = TSTRING2STRING(argv[1]);
        const std::string image_path = TSTRING2STRING(argv[2]);
        const std::string device_name = TSTRING2STRING(argv[3]);

        // -------- Step 1. Initialize OpenVINO Runtime Core --------
        ov::Core core;

        // -------- Step 2. Read a model --------
        slog::info << "Loading model files: " << model_path << slog::endl;
        std::shared_ptr<ov::Model> model = core.read_model(model_path);
        printInputAndOutputsInfo(*model);

        OPENVINO_ASSERT(model->inputs().size() == 1, "Sample supports models with 1 input only");
        OPENVINO_ASSERT(model->outputs().size() == 1, "Sample supports models with 1 output only");

        // -------- Step 3. Set up input

        // Read input image to a tensor and set it to an infer request
        // without resize and layout conversions
        FormatReader::ReaderPtr reader(image_path.c_str());
        if (reader.get() == nullptr) {
            std::stringstream ss;
            ss << "Image " + image_path + " cannot be read!";
            throw std::logic_error(ss.str());
        }

        ov::element::Type input_type = ov::element::u8;
        ov::Shape input_shape = {1, reader->height(), reader->width(),3};
        std::shared_ptr<unsigned char> input_data = reader->getData();

        // just wrap image data by ov::Tensor without allocating of new memory
        ov::Tensor input_tensor = ov::Tensor(input_type, input_shape, input_data.get());

        const ov::Layout tensor_layout{"NHWC"};

        // -------- Step 4. Configure preprocessing --------

        ov::preprocess::PrePostProcessor ppp(model);

        // 1) Set input tensor information:
        // - input() provides information about a single model input
        // - reuse precision and shape from already available `input_tensor`
        // - layout of data is 'NHWC'
        ppp.input().tensor().set_shape(input_shape).set_element_type(input_type).set_layout(tensor_layout);
        // 2) Adding explicit preprocessing steps:
        // - convert layout to 'NCHW' (from 'NHWC' specified above at tensor layout)
        // - apply linear resize from tensor spatial dims to model spatial dims
        ppp.input().preprocess().resize(ov::preprocess::ResizeAlgorithm::RESIZE_LINEAR);
        // 4) Suppose model has 'NCHW' layout for input
        ppp.input().model().set_layout("NHWC");
        // 5) Set output tensor information:
        // - precision of tensor is supposed to be 'f32'
        ppp.output().tensor().set_element_type(ov::element::f32);

        // 6) Apply preprocessing modifying the original 'model'
        model = ppp.build();
        printInputAndOutputsInfo(*model);
        // ======== Step 3: Save the model ================
        std::string xml = "./some_model_saved.xml";
        std::string bin = "./some_model_saved.bin";
        ov::save_model(model, xml);
        // -------- Step 5. Loading a model to the device --------
        ov::CompiledModel compiled_model = core.compile_model(model, device_name);

        // -------- Step 6. Create an infer request --------
        ov::InferRequest infer_request = compiled_model.create_infer_request();
        // -----------------------------------------------------------------------------------------------------
        // -------- Step 7. Prepare input --------
        infer_request.set_input_tensor(input_tensor);

        // -------- Step 8. Do inference synchronously --------
        infer_request.infer();

        // -------- Step 9. Process output
        const ov::Tensor& output_tensor = infer_request.get_output_tensor();

        // Print classification results
        ClassificationResult classification_result(output_tensor, {image_path});
        classification_result.show();
        // -----------------------------------------------------------------------------------------------------
    } catch (const std::exception& ex) {
        std::cerr << ex.what() << std::endl;
        return EXIT_FAILURE;
    }

    return EXIT_SUCCESS;
}

result


[ INFO ] Build ................................. 2024.1.0-15008-f4afc983258-releases/2024/1
[ INFO ]
[ INFO ] Loading model files: D:\pyworkspace\openvino\public\mobilenet-v3-small-1.0-224-tf\FP16\mobilenet-v3-small-1.0-224-tf.xml
[ INFO ] model name: TensorFlow_Frontend_IR
[ INFO ]     inputs
[ INFO ]         input name: input_1
[ INFO ]         input type: f32
[ INFO ]         input shape: [1,224,224,3]
[ INFO ]     outputs
[ INFO ]         output name: Predictions
[ INFO ]         output type: f32
[ INFO ]         output shape: [1,1000]
[ INFO ] model name: TensorFlow_Frontend_IR
[ INFO ]     inputs
[ INFO ]         input name: input_1
[ INFO ]         input type: u8
[ INFO ]         input shape: [1,224,224,3]
[ INFO ]     outputs
[ INFO ]         output name: Predictions
[ INFO ]         output type: f32
[ INFO ]         output shape: [1,1000]

Top 10 results:

Image ./img/dog.bmp

classid probability
------- -----------
156     0.9296221
218     0.0113006
212     0.0073485
215     0.0042554
152     0.0019013
219     0.0018406
217     0.0013028
220     0.0006854
157     0.0006666
213     0.0005458

I have modified these two codes.

ov::Shape input_shape = {1,3, reader->height(), reader->width()};
ppp.input().model().set_layout("NCHW");

The complete code is as follows:

// Copyright (C) 2018-2024 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//

#include <iterator>
#include <memory>
#include <sstream>
#include <string>
#include <vector>

// clang-format off
#include "openvino/openvino.hpp"

#include "samples/args_helper.hpp"
#include "samples/common.hpp"
#include "samples/classification_results.h"
#include "samples/slog.hpp"
#include "format_reader_ptr.h"
// clang-format on

/**
 * @brief Main with support Unicode paths, wide strings
 */
int tmain(int argc, tchar* argv[]) {
    try {
        // -------- Get OpenVINO runtime version --------
        slog::info << ov::get_openvino_version() << slog::endl;

        // -------- Parsing and validation of input arguments --------
        if (argc != 4) {
            slog::info << "Usage : " << TSTRING2STRING(argv[0]) << " <path_to_model> <path_to_image> <device_name>"
                       << slog::endl;
            return EXIT_FAILURE;
        }

        const std::string args = TSTRING2STRING(argv[0]);
        const std::string model_path = TSTRING2STRING(argv[1]);
        const std::string image_path = TSTRING2STRING(argv[2]);
        const std::string device_name = TSTRING2STRING(argv[3]);

        // -------- Step 1. Initialize OpenVINO Runtime Core --------
        ov::Core core;

        // -------- Step 2. Read a model --------
        slog::info << "Loading model files: " << model_path << slog::endl;
        std::shared_ptr<ov::Model> model = core.read_model(model_path);
        printInputAndOutputsInfo(*model);

        OPENVINO_ASSERT(model->inputs().size() == 1, "Sample supports models with 1 input only");
        OPENVINO_ASSERT(model->outputs().size() == 1, "Sample supports models with 1 output only");

        // -------- Step 3. Set up input

        // Read input image to a tensor and set it to an infer request
        // without resize and layout conversions
        FormatReader::ReaderPtr reader(image_path.c_str());
        if (reader.get() == nullptr) {
            std::stringstream ss;
            ss << "Image " + image_path + " cannot be read!";
            throw std::logic_error(ss.str());
        }

        ov::element::Type input_type = ov::element::u8;
        ov::Shape input_shape = {1,3, reader->height(), reader->width()};
        std::shared_ptr<unsigned char> input_data = reader->getData();

        // just wrap image data by ov::Tensor without allocating of new memory
        ov::Tensor input_tensor = ov::Tensor(input_type, input_shape, input_data.get());

        const ov::Layout tensor_layout{"NHWC"};

        // -------- Step 4. Configure preprocessing --------

        ov::preprocess::PrePostProcessor ppp(model);

        // 1) Set input tensor information:
        // - input() provides information about a single model input
        // - reuse precision and shape from already available `input_tensor`
        // - layout of data is 'NHWC'
        ppp.input().tensor().set_shape(input_shape).set_element_type(input_type).set_layout(tensor_layout);
        // 2) Adding explicit preprocessing steps:
        // - convert layout to 'NCHW' (from 'NHWC' specified above at tensor layout)
        // - apply linear resize from tensor spatial dims to model spatial dims
        ppp.input().preprocess().resize(ov::preprocess::ResizeAlgorithm::RESIZE_LINEAR);
        // 4) Suppose model has 'NCHW' layout for input
        ppp.input().model().set_layout("NCHW");
        // 5) Set output tensor information:
        // - precision of tensor is supposed to be 'f32'
        ppp.output().tensor().set_element_type(ov::element::f32);

        // 6) Apply preprocessing modifying the original 'model'
        model = ppp.build();
        printInputAndOutputsInfo(*model);
        // ======== Step 3: Save the model ================
        std::string xml = "./some_model_saved.xml";
        std::string bin = "./some_model_saved.bin";
        ov::save_model(model, xml);
        // -------- Step 5. Loading a model to the device --------
        ov::CompiledModel compiled_model = core.compile_model(model, device_name);

        // -------- Step 6. Create an infer request --------
        ov::InferRequest infer_request = compiled_model.create_infer_request();
        // -----------------------------------------------------------------------------------------------------
        // -------- Step 7. Prepare input --------
        infer_request.set_input_tensor(input_tensor);

        // -------- Step 8. Do inference synchronously --------
        infer_request.infer();

        // -------- Step 9. Process output
        const ov::Tensor& output_tensor = infer_request.get_output_tensor();

        // Print classification results
        ClassificationResult classification_result(output_tensor, {image_path});
        classification_result.show();
        // -----------------------------------------------------------------------------------------------------
    } catch (const std::exception& ex) {
        std::cerr << ex.what() << std::endl;
        return EXIT_FAILURE;
    }

    return EXIT_SUCCESS;
}

The final result is completely wrong.

[ INFO ] Build ................................. 2024.1.0-15008-f4afc983258-releases/2024/1
[ INFO ]
[ INFO ] Loading model files: D:\pyworkspace\openvino\public\mobilenet-v3-small-1.0-224-tf\FP16\mobilenet-v3-small-1.0-224-tf.xml
[ INFO ] model name: TensorFlow_Frontend_IR
[ INFO ]     inputs
[ INFO ]         input name: input_1
[ INFO ]         input type: f32
[ INFO ]         input shape: [1,224,224,3]
[ INFO ]     outputs
[ INFO ]         output name: Predictions
[ INFO ]         output type: f32
[ INFO ]         output shape: [1,1000]
[ INFO ] model name: TensorFlow_Frontend_IR
[ INFO ]     inputs
[ INFO ]         input name: input_1
[ INFO ]         input type: u8
[ INFO ]         input shape: [1,3,224,224]
[ INFO ]     outputs
[ INFO ]         output name: Predictions
[ INFO ]         output type: f32
[ INFO ]         output shape: [1,1000]

Top 10 results:

Image ./img/dog.bmp

classid probability
------- -----------
905     0.0614840
782     0.0582295
409     0.0387410
418     0.0342315
530     0.0262313
688     0.0240049
916     0.0237779
851     0.0187113
446     0.0140004
885     0.0129699

Step-by-step reproduction

No response

Relevant log output

No response

Issue submission checklist

[X] I'm reporting an issue. It's not a question.
[X] I checked the problem with the documentation, FAQ, open issues, Stack Overflow, etc., and have not found a solution.
[X] There is reproducer code and related data files such as images, videos, models, etc.

Apr 29 '24 02:04 ZJDATY

ppp.input().model().set_layout("NCHW"); doesn't modify your model. It sets the info for ppp about internal model layout so ppp.input().tensor().set_shape(input_shape).set_element_type(input_type).set_layout(tensor_layout); knows whether the transpose is required.

Apr 29 '24 10:04 Wovchena

ppp.input().model().set_layout("NCHW");

Why does the model layout need to be set twice? What is the difference?

// First define layout for your tensor
ppp.input("input").tensor().set_layout("NHWC");

// Then define layout of model
ppp.input("input").model().set_layout("NCHW");

Apr 29 '24 11:04 ZJDATY

The first is the input layout. It's the layout you are going to follow while providing the input for inference. The second is the model layout. That's how the model weights are ordered, You can't influence weights layout and you are expected to know that when the model is trained.

Apr 29 '24 11:04 Wovchena

I am having a related(?) problem with a NPU single-layer-test for MVN, below config passes vs OV reference:

void configure_model() override {
        ov::preprocess::PrePostProcessor p(function);
        p.input(0).tensor().set_layout(ov::Layout("NCHW"));
        p.input(0).model().set_layout(ov::Layout("NCHW"));
        p.output(0).model().set_layout(ov::Layout("NCHW"));
        p.output(0).tensor().set_layout(ov::Layout("NCHW"));
}

But NHWC config fails:

 void configure_model() override {
        ov::preprocess::PrePostProcessor p(function);
        p.input(0).tensor().set_layout(ov::Layout("NHWC"));
        p.input(0).model().set_layout(ov::Layout("NHWC"));
        p.output(0).model().set_layout(ov::Layout("NHWC"));
        p.output(0).tensor().set_layout(ov::Layout("NHWC"));
}

And the problem I see is that OV dumped .ref output is the same in both cases, but shouldn't be. This used to work fine in the past, not sure what changed recently.

Apr 29 '24 18:04 AndreiLupas

It seems I'm not educated enough to understand the problem description. @Maxim-Doronin, can you take a look?

Apr 30 '24 08:04 Wovchena

openvino openvino copied to clipboard

[Bug]: convert layout to 'NCHW' (from 'NHWC' specified above at tensor layout) unable to infer normally

OpenVINO Version

Operating System

Device used for inference

Framework

Model used

Issue description

Step-by-step reproduction

Relevant log output

Issue submission checklist

openvino
openvino copied to clipboard