openvino How to run openvino on a single thread?

I am using openvino to infer a model and I want to compile and infer a model on a single thread. Below is the code.

// model compilation
  ov::Core core;
  std::shared_ptr<ov::Model> model;
  ov::CompiledModel compiled_model;
  model = core.read_model(file_path);
  const ov::Layout tensor_layout{"NHWC"};
  ov::preprocess::PrePostProcessor ppp(model);
  ov::preprocess::InputInfo& input_info = ppp.input();
  input_info.tensor().set_element_type(ov::element::f32).set_layout(tensor_layout);
  input_info.model().set_layout("NCHW");
  ppp.output().tensor().set_element_type(ov::element::f32);
  ppp.output().postprocess().convert_layout({0, 2, 1});
  model = ppp.build();   
  compiled_model = core.compile_model(model, device_name);// , tput);
  
 // model inference
  cv::Mat resized_image = cv::imread("./test.jpg");
  ov::element::Type input_type = ov::element::f32;
  ov::Shape input_shape = {1, 640, 640, 3};
  ov::Tensor input_tensor = ov::Tensor(input_type, input_shape, resized_image.data);

  ov::InferRequest infer_request = compiled_model.create_infer_request();
  infer_request.set_input_tensor(input_tensor);
  infer_request.infer();
  ov::Tensor output_tensor = infer_request.get_output_tensor();

 float *data = (float *)output_tensor.data();
  cv::Mat outputMat(output_tensor.get_shape()[1], output_tensor.get_shape()[2], CV_32F, data);

Jul 10 '24 07:07 navyverma

ov::inference_num_threads limits the number of logical processors used for CPU inference.

compiled_model = core.compile_model(model, device_name, ov::inference_num_threads(1));

Jul 12 '24 03:07 YuChern-Intel

@YuChern-Intel I have tried above mentioned solution but it still takes multiple threads.

Jul 18 '24 05:07 navyverma

In that case, you have to build open source OpenVINO with -DTHREADING=SEQ to disable threading optimizations.

Ref: 17566

Jul 18 '24 15:07 YuChern-Intel

@YuChern-Intel Let me try this then I will update.

Jul 19 '24 10:07 navyverma

@YuChern-Intel I have tried above mentioned solution but it still takes multiple threads.

@YuChern-Intel Could you please clarify how exactly you analyzed number of used threads? Usage of ov::inference_num_threads(1) is expected to be enough for single threaded inference. So I would really like to root-cause the issue. cc'ed @wangleis

Jul 22 '24 07:07 dmitry-gorokhov

@YuChern-Intel May I know which one is your expectation?

Application only create one thread and run inference on this thread.
Application created multiple threads and run inference on one thread.

Jul 22 '24 08:07 wangleis

@YuChern-Intel My use case is to only create one thread and run inference on this thread.

Jul 22 '24 09:07 navyverma

@navyverma Since OpenVINO supports both synchronous inference and asynchronous inference, OpenVINO will create additional thread for asynchronous inference during compiling model even if ov::inference_num_threads(1). Then if user use infer_request.infer() to trigger synchronous inference, inference will run on original thread.

So application creates two threads and only runs one. May I know if it is acceptable in your use case?

If yes, please try below steps:

Build OpenVINO with -DTHREADING=SEQ
Compile model with ov::inference_num_threads(1)
Use infer_request.infer() to trigger synchronous inference

Jul 22 '24 14:07 wangleis

@wangleis No, My application cannot use two threads. Thread should be main thread for the compilation of model and for the inference as well. So only single thread for complete application is my use case.

Jul 23 '24 06:07 navyverma

@navyverma Then this is new requirement for OpenVINO. OpenVINO need new property to indicate sync or async inference in the future during compilation. And use main thread to compile model for sync inference. Will create PR for this requirement. Thanks.

Jul 23 '24 07:07 wangleis

@wangleis Any update on this?

Jul 29 '24 06:07 navyverma

@navyverma This feature was planned and added into development pipeline. Will update when the feature is ready.

Jul 29 '24 06:07 wangleis

@navyverma Could you try compiling model with ov::num_streams(0)?

Aug 06 '24 04:08 wangleis

@wangleis I have one more question. Does openvino support fork?

Aug 21 '24 06:08 navyverma

@wangleis I have one more question. Does openvino support fork?

Yes

Aug 21 '24 06:08 wangleis

@wangleis Do I need to enable or setup anything at runtime for fork support?

Aug 21 '24 10:08 navyverma

@wangleis I am trying to load the networks in the parent process and inference calls are made in the forked child. The Infer() call is hanging in the child process. The same call works fine when performed in the parent process.

Aug 21 '24 11:08 navyverma

@navyverma I may have misunderstood the word "fork" in the question Does openvino support fork?

In your case, may I know if you compile model in child process?

Aug 21 '24 14:08 wangleis

@wangleis I am compiling model in parent process and then performing inference in multiple forked child process.

Aug 22 '24 04:08 navyverma

@wangleis Sorry for the initial update. My application is still using multiple threads even with ov::num_streams(0)

Aug 22 '24 08:08 navyverma

@navyverma I made a demo for testing fork and the result is ok. The code are as follows:

   `ov::Core core;
    std::shared_ptr<ov::Model> model = core.read_model(model_path);
    FormatReader::ReaderPtr reader(image_path.c_str());
    if (reader.get() == nullptr) {
        throw std::logic_error(ss.str());
    }
    ov::element::Type input_type = ov::element::u8;
    ov::Shape input_shape = {1, reader->height(), reader->width(), 3};
    std::shared_ptr<unsigned char> input_data = reader->getData();
    ov::Tensor input_tensor = ov::Tensor(input_type, input_shape, input_data.get());
    const ov::Layout tensor_layout{"NHWC"};
    ov::preprocess::PrePostProcessor ppp(model);
    ppp.input().tensor().set_shape(input_shape).set_element_type(input_type).set_layout(tensor_layout);
    ppp.input().preprocess().resize(ov::preprocess::ResizeAlgorithm::RESIZE_LINEAR);
    ppp.input().model().set_layout("NCHW");
    ppp.output().tensor().set_element_type(ov::element::f32);
    model = ppp.build();
    ov::CompiledModel compiled_model = core.compile_model(model, device_name);

    pid_t pid = fork();
    if (pid >= 0) {
        ov::InferRequest infer_request = compiled_model.create_infer_request();
        infer_request.set_input_tensor(input_tensor);
        infer_request.infer();
        const ov::Tensor& output_tensor = infer_request.get_output_tensor();
        if (pid == 0) {
            exit(0);
        }
    } else {
        std::cout << "fork error\n";
    }
    if (pid > 0) {
        int returnStatus;
        waitpid(pid, &returnStatus, 0);
        if (returnStatus == 0) {
            std::cout << "child process end\n";
        }
    }`

Aug 23 '24 03:08 sunxiaoxia2022

@navyverma May I know which OpenVINO version are you using? Could you try example with master branch or the latest release?

Aug 23 '24 04:08 wangleis

@wangleis OpenVINO version is 2023.3.0-13775-ceeafaf64f3-releases/2023/3. I will try with latest master branch and update you. @sunxiaoxia2022 Above example is running on multiple threads and in my use case my application should use only single thread and I am creating multiple fork process inside my application.

Aug 23 '24 05:08 navyverma

@wangleis I ran my application on latest OpenVINO version: 2024.3.0-16041-1e3b88e4e3f-releases/2024/3 and still application uses multiple thread.

Aug 23 '24 06:08 navyverma

@wangleis I ran my application on latest OpenVINO version: 2024.3.0-16041-1e3b88e4e3f-releases/2024/3 and still application uses multiple thread.

@navyverma Please try ov::num_streams(0) on master branch. The fix will be included in next release.

Aug 27 '24 05:08 wangleis

@wangleis I tried ov::num_streams(0) on master branch with OpenVINO version: 2024.4.0-16446-eb16f7fbd0d and my application started using single thread.

Aug 28 '24 06:08 navyverma

@navyverma Since OpenVINO supports both synchronous inference and asynchronous inference, OpenVINO will create additional thread for asynchronous inference during compiling model even if ov::inference_num_threads(1). Then if user use infer_request.infer() to trigger synchronous inference, inference will run on original thread.

So application creates two threads and only runs one. May I know if it is acceptable in your use case?

If yes, please try below steps:

Build OpenVINO with -DTHREADING=SEQ

Compile model with ov::inference_num_threads(1)

Use infer_request.infer() to trigger synchronous inference

After doing all above instructions, program still runs in multi thread, openvin version 2025, build with -DTHREADING=SEQ option, here is my init function:

initialize(const std::string& scrfd_model_path, const std::string& adaface_model_path, const std::string& device, int input_height, int input_width) { try { // Store parameters device_ = device; input_height_ = input_height; input_width_ = input_width;

    // Force CPU to use single-threaded mode at core level
    core_.set_property(ov::cache_dir(""));
    if (device == "CPU") {
        try {
            // Try to set global CPU properties to force single threading
            core_.set_property("CPU", ov::inference_num_threads(1));
            core_.set_property("CPU", ov::streams::num(1));
        } catch (...) {
            // If properties fail, continue with local config
            std::cout << "Warning: Could not set global CPU properties" << std::endl;
        }
    }
    
    // Initialize SCRFD model
    auto scrfd_model = core_.read_model(scrfd_model_path);
    
    // Set custom input shape for SCRFD
    ov::PartialShape new_input_shape = {1, 3, input_height, input_width};
    std::map<std::string, ov::PartialShape> port_to_shape;
    port_to_shape[scrfd_model->input().get_any_name()] = new_input_shape;
    scrfd_model->reshape(port_to_shape);
    
    // Get input/output info after reshape
    auto input_port = scrfd_model->input();
    input_shape_ = input_port.get_shape();
    num_outputs_ = scrfd_model->outputs().size();
    
    // Compile SCRFD model with absolute zero child threads
    ov::AnyMap scrfd_config;
    scrfd_config[ov::inference_num_threads.name()] = 1;
    scrfd_config[ov::streams::num.name()] = 1;
    scrfd_config[ov::enable_profiling.name()] = false;
    scrfd_config[ov::hint::performance_mode.name()] = ov::hint::PerformanceMode::LATENCY;
    scrfd_config[ov::hint::execution_mode.name()] = ov::hint::ExecutionMode::ACCURACY;
    
    // Force single core usage - try CPU:0 explicitly
    if (device == "CPU") {
        try {
            scrfd_model_ = core_.compile_model(scrfd_model, "CPU:0", scrfd_config);
        } catch (...) {
            scrfd_model_ = core_.compile_model(scrfd_model, device, scrfd_config);
        }
    } else {
        scrfd_model_ = core_.compile_model(scrfd_model, device, scrfd_config);
    }
    scrfd_request_ = scrfd_model_.create_infer_request();
    
    // Initialize SCRFD context
    initializeSCRFDContext();
    
    std::cout << "SCRFD model loaded successfully!" << std::endl;
    std::cout << "Input shape: [" << input_shape_[0] << ", " << input_shape_[1] 
              << ", " << input_shape_[2] << ", " << input_shape_[3] << "]" << std::endl;
    std::cout << "Number of outputs: " << num_outputs_ << std::endl;
    
    // Initialize AdaFace model with absolute zero child threads
    auto adaface_model = core_.read_model(adaface_model_path);
    ov::AnyMap adaface_config;
    adaface_config[ov::inference_num_threads.name()] = 1;
    adaface_config[ov::streams::num.name()] = 1;
    adaface_config[ov::enable_profiling.name()] = false;
    adaface_config[ov::hint::performance_mode.name()] = ov::hint::PerformanceMode::LATENCY;
    adaface_config[ov::hint::execution_mode.name()] = ov::hint::ExecutionMode::ACCURACY;
    
    // Force single core usage - try CPU:0 explicitly
    if (device == "CPU") {
        try {
            adaface_model_ = core_.compile_model(adaface_model, "CPU:0", adaface_config);
        } catch (...) {
            adaface_model_ = core_.compile_model(adaface_model, device, adaface_config);
        }
    } else {
        adaface_model_ = core_.compile_model(adaface_model, device, adaface_config);
    }
    adaface_request_ = adaface_model_.create_infer_request();
    
    std::cout << "AdaFace model loaded successfully!" << std::endl;
    std::cout << "Device: " << device << std::endl;
    
    initialized_ = true;
    return true;
}
catch (const std::exception& e) {
    std::cerr << "Error initializing FaceRecognition: " << e.what() << std::endl;
    return false;
}

}

Jul 22 '25 01:07 RamatovInomjon

@RamatovInomjon Please try ov::num_streams(0).

Jul 22 '25 02:07 sunxiaoxia2022