openvino
openvino copied to clipboard
[Good First Issue]: [OP CONFORMANCE][TEMPLATE] One Reduce op test is failed in op conformance over template
Context
OP conformance suite is validation tool checks a plugin conditions from operation implementation status perspective.
OP conformance is based on extracted from OMZ model scope operations and graphs by ov_subgraphs_dumper tool. Extracted graphs are saved as IR (OpenVINO Intermediate representation) and is stored in the public share.
OP conformance suite contains the following test types:
OpImplCheckvalidates operation support by plugin. Returntrueorfalse.Inferencecompare device inference results vs reference over extracted IR as a model. We generate synthetic tensors to get a result. Possible results arepassed,failed,hanged(interrupted by timeout),crashedandskipped(just in case-shape_modeis misaligned with graph inputs).QueryModelcheck possibility of graph execution per device. Have the same status asInferenceImportExportexports compiled model, imports it and check that models are same. Have the same status asInference
Template is a simple plugin run inference using reference implementation. It means if we will run conformance over template, we should compare TEMPLATE vs TEMPLATE results. Sometimes we have bugs inside the plugin, reference implementation or test infrastructure leads to negative test result.
What needs to be done?
Description:
conformance_Multiply/ReadIRTest.Inference/Op=Multiply.1_Type=f32_Shape=dynamic_IR=287a7562757ef0295cc38442e3d775cff0fb1ea9b27e6897bd456f01ce82d455_Device=TEMPLATE_Config=() is failed.
How to reproduce:
- Build the OV using -DENABLE_TESTS=ON -DENABLE_FUNACTIONAL_TESTS=ON
- Build
ov_op_conformance_teststarget - Run OP conformance to download conformance IRs and run executable file with args:
python3 /openvino/src/tests/test_utils/functional_test_utils/layer_tests_summary/run_conformance.py -d=TEMPLATE --gtest_filter="conformance_ReduceSum/ReadIRTest.Inference/Op=ReduceSum.1_Type=f32_Shape=dynamic_IR=d11097e7fa04dc0b540bf3b963cde252591b39b7dcbfae66e64ed19cd2b3b06e_Device=TEMPLATE_Config=(),"
NOTE:
run_conformancelog contains all commands to runov_op_conformance_tests! You can take them to debug!
- Check the logs inside working directory and get the results:
[ RUN ] conformance_ReduceSum/ReadIRTest.Inference/Op=ReduceSum.1_Type=f32_Shape=dynamic_IR=d11097e7fa04dc0b540bf3b963cde252591b39b7dcbfae66e64ed19cd2b3b06e_Device=TEMPLATE_Config=()
MEM_USAGE=55352KB
[ CONFORMANCE ] Influence coefficient: 2.89071e-06
[ PLUGIN ] `SubgraphBaseTest::compile_model()` is started
[ PLUGIN ] `SubgraphBaseTest::compile_model()` is finished successfully. Duration is 0.00740622s
[ REFERENCE ] `SubgraphBaseTest::calculate_refs()` is started
[ REFERENCE ] Calculate reference in runtime
[ REFERENCE ] `SubgraphBaseTest::calculate_refs()` is started
[ PLUGIN ] `SubgraphBaseTest::get_plugin_outputs()` is started
[ PLUGIN ] `SubgraphBaseTest::get_plugin_outputs()` is finished successfully. Duration is 0.0024105s
[ REFERENCE ] `SubgraphBaseTest::calculate_refs()` is finished successfully. Duration is 0.0155289s
[ REFERENCE ] `SubgraphBaseTest::calculate_refs()` is finished successfully. Duration is 0.0275021s
[ COMPARATION ] `ov_tensor_utils.hpp::compare()` is started
[ COMPARATION ] rel_threshold: 0.12545
[ COMPARATION ] abs_threshold: 0.12545
[ COMPARATION ] `ov_tensor_utils.hpp::compare()` is finished successfully. Duration is 0.00113002s
[ PLUGIN ] `SubgraphBaseTest::get_plugin_outputs()` is started
[ REFERENCE ] `SubgraphBaseTest::calculate_refs()` is started
[ REFERENCE ] Calculate reference in runtime
[ REFERENCE ] `SubgraphBaseTest::calculate_refs()` is started
[ PLUGIN ] `SubgraphBaseTest::get_plugin_outputs()` is finished successfully. Duration is 137.937s
[ REFERENCE ] `SubgraphBaseTest::calculate_refs()` is finished successfully. Duration is 139.669s
[ REFERENCE ] `SubgraphBaseTest::calculate_refs()` is finished successfully. Duration is 139.669s
[ COMPARATION ] `ov_tensor_utils.hpp::compare()` is started
[ COMPARATION ] rel_threshold: 153.087
[ COMPARATION ] abs_threshold: 153.087
[ COMPARATION ] `ov_tensor_utils.hpp::compare()` is finished successfully. Duration is 0.0123941s
src/tests/functional/shared_test_classes/src/base/ov_subgraph.cpp:96: Failure
Exception from src/core/src/runtime/allocator.cpp:69:
std::bad_alloc
[ FAILED ] conformance_ReduceSum/ReadIRTest.Inference/Op=ReduceSum.1_Type=f32_Shape=dynamic_IR=d11097e7fa04dc0b540bf3b963cde252591b39b7dcbfae66e64ed19cd2b3b06e_Device=TEMPLATE_Config=(), where GetParam() = (("/home/efode/repo/openvino/src/tests/test_utils/functional_test_utils/layer_tests_summary/temp/template_conformance/models/2023.3.0-13657-d5b0f4d2d73/operation/dynamic/ReduceSum-1/f32/d11097e7fa04dc0b540bf3b963cde252591b39b7dcbfae66e64ed19cd2b3b06e.xml", ""), "TEMPLATE", {}) (144870 ms)
- To debug the test in C++ Use the following command (just an example, refer to note in third item):
openvino/bin/intel64/Release/ov_op_conformance_tests --device=TEMPLATE --input_folders=openvino/src/tests/test_utils/functional_test_utils/layer_tests_summary/temp/models/conformance_ir_files.lst, --report_unique_name --output_folder="openvino/src/tests/test_utils/functional_test_utils/layer_tests_summary/temp/report/parallel" --gtest_filter="conformance_ReduceSum/ReadIRTest.Inference/Op=ReduceSum.1_Type=f32_Shape=dynamic_IR=d11097e7fa04dc0b540bf3b963cde252591b39b7dcbfae66e64ed19cd2b3b06e_Device=TEMPLATE_Config=()," --config_path="" --shape_mode=
Expected result: Passed status for mentioned test.
Example Pull Requests
No response
Resources
- Contribution guide - start here!
- Intel DevHub Discord channel - engage in discussions, ask questions and talk to OpenVINO developers
- Conformance readme
- TEMPLATE plugin
Contact points
@iefode
Ticket
No response
@iefode can I pick this up?
@manangoel99 Sure! Please add comment: .take
Thank you for looking into this issue! Please let us know if you have any questions or require any help.
Thanks for the response @iefode along with the detailed issue description, will start looking into this today
I'm reopening the task due to current assignee's inactivity. If you're still working on this please let us know.
.take
Thank you for looking into this issue! Please let us know if you have any questions or require any help.
Hello @mahajanparth, are you still working on this? Is there anything we could help you with?
Hello @mahajanparth!
What is the status of the issue? Please, let us know in case of any issues
We will unassign the ticket from you in case of inaction
hi @iefode can I take this up?
it's yours, @keshav2800
hi @mlukasze actually i am facing some problems to reproduce this issue
when i try to build OV using github repo it shows me error and when i try to look at it i found out many of the package in /openvino/thirdparty have been relalocated
can you let me know any other way i can reproduce this project and start working on?
did you updated submodules?
git submodule update --init --recursive
@keshav2800 - do you need any support or you don't have to finish this task?
I am sorry @mlukasze . I want to opt out.
no worries, thank you for your time :)
.take
Thank you for looking into this issue! Please let us know if you have any questions or require any help.
Hello @vsriramv, do you have any questions or require any help?
.take
Thank you for looking into this issue! Please let us know if you have any questions or require any help.
Hello! @iefode
I have been working on fixing the ReduceSum memory allocation issue. I found that the problem was in reduce_sum.hpp where a large vector is allocated for Kahan summation, causing std::bad_alloc with big tensors.
I implemented a fix by adding a threshold check (100 million elements) to use simple addition for very large tensors instead of Kahan summation that requires extra memory. This saves memory while keeping good precision for normal-sized inputs.
When testing my fix, I see the ReduceSum implementation now works better - reference calculation completes without errors. But then test still fails with std::bad_alloc from src/core/src/runtime/allocator.cpp:71 in a different part of code.
Can you please advise if I should proceed with committing my current fix for reduce_sum.hpp, or if I should investigate the allocator issue further before submitting my changes?
I'd like to follow up on the ReduceSum memory allocation issue. I've implemented a fix in reduce_sum.hpp that prevents the main memory allocation failure by using a different algorithm for large tensors. While this solves the core issue in the operation itself, there's still a secondary std::bad_alloc error from the runtime allocator (src/core/src/runtime/allocator.cpp:71) occurring after computation completes.
@iefode @mlukasze @p-wysocki Could you please advise on how to proceed? Should I submit my current fix for the ReduceSum implementation, or should I investigate the allocator issue further?
bump @iefode