openvino icon indicating copy to clipboard operation
openvino copied to clipboard

[Good First Issue]: [OP CONFORMANCE][TEMPLATE] One Reduce op test is failed in op conformance over template

Open iefode opened this issue 1 year ago • 12 comments

Context

OP conformance suite is validation tool checks a plugin conditions from operation implementation status perspective. OP conformance is based on extracted from OMZ model scope operations and graphs by ov_subgraphs_dumper tool. Extracted graphs are saved as IR (OpenVINO Intermediate representation) and is stored in the public share. OP conformance suite contains the following test types:

  • OpImplCheck validates operation support by plugin. Return true or false.
  • Inference compare device inference results vs reference over extracted IR as a model. We generate synthetic tensors to get a result. Possible results are passed, failed, hanged (interrupted by timeout), crashed and skipped (just in case -shape_mode is misaligned with graph inputs).
  • QueryModel check possibility of graph execution per device. Have the same status as Inference
  • ImportExport exports compiled model, imports it and check that models are same. Have the same status as Inference

Template is a simple plugin run inference using reference implementation. It means if we will run conformance over template, we should compare TEMPLATE vs TEMPLATE results. Sometimes we have bugs inside the plugin, reference implementation or test infrastructure leads to negative test result.

What needs to be done?

Description: conformance_Multiply/ReadIRTest.Inference/Op=Multiply.1_Type=f32_Shape=dynamic_IR=287a7562757ef0295cc38442e3d775cff0fb1ea9b27e6897bd456f01ce82d455_Device=TEMPLATE_Config=() is failed.

How to reproduce:

  1. Build the OV using -DENABLE_TESTS=ON -DENABLE_FUNACTIONAL_TESTS=ON
  2. Build ov_op_conformance_tests target
  3. Run OP conformance to download conformance IRs and run executable file with args:
python3 /openvino/src/tests/test_utils/functional_test_utils/layer_tests_summary/run_conformance.py -d=TEMPLATE --gtest_filter="conformance_ReduceSum/ReadIRTest.Inference/Op=ReduceSum.1_Type=f32_Shape=dynamic_IR=d11097e7fa04dc0b540bf3b963cde252591b39b7dcbfae66e64ed19cd2b3b06e_Device=TEMPLATE_Config=(),"

NOTE: run_conformance log contains all commands to run ov_op_conformance_tests! You can take them to debug!

  1. Check the logs inside working directory and get the results:
[ RUN      ] conformance_ReduceSum/ReadIRTest.Inference/Op=ReduceSum.1_Type=f32_Shape=dynamic_IR=d11097e7fa04dc0b540bf3b963cde252591b39b7dcbfae66e64ed19cd2b3b06e_Device=TEMPLATE_Config=()

MEM_USAGE=55352KB
[ CONFORMANCE ] Influence coefficient: 2.89071e-06
[ PLUGIN      ] `SubgraphBaseTest::compile_model()` is started
[ PLUGIN      ] `SubgraphBaseTest::compile_model()` is finished successfully. Duration is 0.00740622s
[ REFERENCE   ] `SubgraphBaseTest::calculate_refs()` is started
[ REFERENCE   ] Calculate reference in runtime
[ REFERENCE   ] `SubgraphBaseTest::calculate_refs()` is started
[ PLUGIN      ] `SubgraphBaseTest::get_plugin_outputs()` is started
[ PLUGIN      ] `SubgraphBaseTest::get_plugin_outputs()` is finished successfully. Duration is 0.0024105s
[ REFERENCE   ] `SubgraphBaseTest::calculate_refs()` is finished successfully. Duration is 0.0155289s
[ REFERENCE   ] `SubgraphBaseTest::calculate_refs()` is finished successfully. Duration is 0.0275021s
[ COMPARATION ] `ov_tensor_utils.hpp::compare()` is started
[ COMPARATION ] rel_threshold: 0.12545
[ COMPARATION ] abs_threshold: 0.12545
[ COMPARATION ] `ov_tensor_utils.hpp::compare()` is finished successfully. Duration is 0.00113002s
[ PLUGIN      ] `SubgraphBaseTest::get_plugin_outputs()` is started
[ REFERENCE   ] `SubgraphBaseTest::calculate_refs()` is started
[ REFERENCE   ] Calculate reference in runtime
[ REFERENCE   ] `SubgraphBaseTest::calculate_refs()` is started
[ PLUGIN      ] `SubgraphBaseTest::get_plugin_outputs()` is finished successfully. Duration is 137.937s
[ REFERENCE   ] `SubgraphBaseTest::calculate_refs()` is finished successfully. Duration is 139.669s
[ REFERENCE   ] `SubgraphBaseTest::calculate_refs()` is finished successfully. Duration is 139.669s
[ COMPARATION ] `ov_tensor_utils.hpp::compare()` is started
[ COMPARATION ] rel_threshold: 153.087
[ COMPARATION ] abs_threshold: 153.087
[ COMPARATION ] `ov_tensor_utils.hpp::compare()` is finished successfully. Duration is 0.0123941s
src/tests/functional/shared_test_classes/src/base/ov_subgraph.cpp:96: Failure
Exception from src/core/src/runtime/allocator.cpp:69:
std::bad_alloc

[  FAILED  ] conformance_ReduceSum/ReadIRTest.Inference/Op=ReduceSum.1_Type=f32_Shape=dynamic_IR=d11097e7fa04dc0b540bf3b963cde252591b39b7dcbfae66e64ed19cd2b3b06e_Device=TEMPLATE_Config=(), where GetParam() = (("/home/efode/repo/openvino/src/tests/test_utils/functional_test_utils/layer_tests_summary/temp/template_conformance/models/2023.3.0-13657-d5b0f4d2d73/operation/dynamic/ReduceSum-1/f32/d11097e7fa04dc0b540bf3b963cde252591b39b7dcbfae66e64ed19cd2b3b06e.xml", ""), "TEMPLATE", {}) (144870 ms)

  1. To debug the test in C++ Use the following command (just an example, refer to note in third item):
openvino/bin/intel64/Release/ov_op_conformance_tests --device=TEMPLATE --input_folders=openvino/src/tests/test_utils/functional_test_utils/layer_tests_summary/temp/models/conformance_ir_files.lst, --report_unique_name --output_folder="openvino/src/tests/test_utils/functional_test_utils/layer_tests_summary/temp/report/parallel" --gtest_filter="conformance_ReduceSum/ReadIRTest.Inference/Op=ReduceSum.1_Type=f32_Shape=dynamic_IR=d11097e7fa04dc0b540bf3b963cde252591b39b7dcbfae66e64ed19cd2b3b06e_Device=TEMPLATE_Config=()," --config_path="" --shape_mode=

Expected result: Passed status for mentioned test.

Example Pull Requests

No response

Resources

Contact points

@iefode

Ticket

No response

iefode avatar Feb 09 '24 10:02 iefode

@iefode can I pick this up?

manangoel99 avatar Feb 19 '24 18:02 manangoel99

@manangoel99 Sure! Please add comment: .take

iefode avatar Feb 20 '24 07:02 iefode

Thank you for looking into this issue! Please let us know if you have any questions or require any help.

github-actions[bot] avatar Feb 20 '24 07:02 github-actions[bot]

Thanks for the response @iefode along with the detailed issue description, will start looking into this today

manangoel99 avatar Feb 20 '24 08:02 manangoel99

I'm reopening the task due to current assignee's inactivity. If you're still working on this please let us know.

p-wysocki avatar Mar 12 '24 09:03 p-wysocki

.take

mahajanparth avatar Mar 13 '24 07:03 mahajanparth

Thank you for looking into this issue! Please let us know if you have any questions or require any help.

github-actions[bot] avatar Mar 13 '24 07:03 github-actions[bot]

Hello @mahajanparth, are you still working on this? Is there anything we could help you with?

p-wysocki avatar Apr 03 '24 18:04 p-wysocki

Hello @mahajanparth!

What is the status of the issue? Please, let us know in case of any issues

We will unassign the ticket from you in case of inaction

iefode avatar Apr 25 '24 13:04 iefode

hi @iefode can I take this up?

keshav2800 avatar Oct 14 '24 19:10 keshav2800

it's yours, @keshav2800

mlukasze avatar Oct 15 '24 04:10 mlukasze

hi @mlukasze actually i am facing some problems to reproduce this issue

when i try to build OV using github repo it shows me error and when i try to look at it i found out many of the package in /openvino/thirdparty have been relalocated

can you let me know any other way i can reproduce this project and start working on?

keshav2800 avatar Oct 20 '24 12:10 keshav2800

did you updated submodules? git submodule update --init --recursive

mlukasze avatar Oct 21 '24 04:10 mlukasze

@keshav2800 - do you need any support or you don't have to finish this task?

mlukasze avatar Nov 06 '24 07:11 mlukasze

I am sorry @mlukasze . I want to opt out.

keshav2800 avatar Nov 07 '24 06:11 keshav2800

no worries, thank you for your time :)

mlukasze avatar Nov 07 '24 06:11 mlukasze

.take

vsriramv avatar Feb 14 '25 17:02 vsriramv

Thank you for looking into this issue! Please let us know if you have any questions or require any help.

github-actions[bot] avatar Feb 14 '25 17:02 github-actions[bot]

Hello @vsriramv, do you have any questions or require any help?

p-wysocki avatar Mar 05 '25 12:03 p-wysocki

.take

darshil929 avatar Mar 13 '25 06:03 darshil929

Thank you for looking into this issue! Please let us know if you have any questions or require any help.

github-actions[bot] avatar Mar 13 '25 06:03 github-actions[bot]

Hello! @iefode

I have been working on fixing the ReduceSum memory allocation issue. I found that the problem was in reduce_sum.hpp where a large vector is allocated for Kahan summation, causing std::bad_alloc with big tensors.

I implemented a fix by adding a threshold check (100 million elements) to use simple addition for very large tensors instead of Kahan summation that requires extra memory. This saves memory while keeping good precision for normal-sized inputs.

When testing my fix, I see the ReduceSum implementation now works better - reference calculation completes without errors. But then test still fails with std::bad_alloc from src/core/src/runtime/allocator.cpp:71 in a different part of code.

Can you please advise if I should proceed with committing my current fix for reduce_sum.hpp, or if I should investigate the allocator issue further before submitting my changes?

darshil929 avatar Mar 16 '25 07:03 darshil929

I'd like to follow up on the ReduceSum memory allocation issue. I've implemented a fix in reduce_sum.hpp that prevents the main memory allocation failure by using a different algorithm for large tensors. While this solves the core issue in the operation itself, there's still a secondary std::bad_alloc error from the runtime allocator (src/core/src/runtime/allocator.cpp:71) occurring after computation completes.

@iefode @mlukasze @p-wysocki Could you please advise on how to proceed? Should I submit my current fix for the ReduceSum implementation, or should I investigate the allocator issue further?

darshil929 avatar Mar 22 '25 07:03 darshil929

bump @iefode

p-wysocki avatar Mar 24 '25 11:03 p-wysocki