openvino
openvino copied to clipboard
[Good First Issue]: Inconsistent behavior/errors of Equal operation in comparison with TF framework using inf, -inf
Context
While implementing layer test for Equal operation of TF framework I found an issue with inf and -inf comparison.
As a source data next shapes contents are used:
Values for checking important corner cases for float values
expect: false false false false false false true false true
x_corner = [1. , 1. , 1. , np.nan, np.nan, np.nan , np.inf, np.inf , np.NINF] y_corner = [np.nan, np.inf, np.NINF, np.nan, np.inf, np.NINF, np.inf, np.NINF, np.NINF] Test compares shapes item by item and put result into output shape.
For TEST_DEVICE=CPU plugin make comparison it returns unexpected False for comparison float16 inf==inf and -inf==-inf (TF framework returns True). For float32 and float64 it returns similar result as TF Framework. For TEST_DEVICE=GPU just fails on these cases with an error for all three types.
Test command line: python -m pytest layer_tests\tensorflow_tests\test_tf_Equal.py --use_new_frontend -k float
What needs to be done?
To fix comparing
Example Pull Requests
No response
Resources
- Contribution guide - start here!
- Intel DevHub Discord channel - engage in discussions, ask questions and talk to OpenVINO developers
- How to link your Pull Request to an issue
Contact points
sshlyapn, p-durandin
Ticket
No response
.take
Thank you for looking into this issue! Please let us know if you have any questions or require any help.
The first observation is that, there are some wrong values in ymm register for input values of Equal op: m256_f32 = {1.00000000, 1.00000000, 1.00000000, nan, nan, nan, 65504.0000, 65504.0000} vs weights that have correct values: m256_f32 = {nan, inf, -inf, nan, inf, -inf, inf, -inf}
Maybe the reason is the conversion op But they both go through conversion node...
UPD.
So, in case of CPU, the weights go as constant nodes and are loaded correctly, but a conversion node for input uses
static_cast<dst_t>(std::max(std::min(tmp[j], ubound), lbound));
where ubound and lbound are numeric_limits::max and numeric_limits::lowest respectively that are pos and neg finite values and thus this conversion clamps infinities to finite values
This clamping apparently makes sense only for integral dst types, if the dst type is one of the floating point types than just a static_cast can be used
@DannyVlasenko while you wait reply from CPU, have you checked GPU behavior?
@DannyVlasenko while you wait reply from CPU, have you checked GPU behavior?
Looks like I have a solution for CPU, so I can make a separate PR for it, if it is ok.
For GPU the only findings so far are that the kernel selector sets the EnableFP16Emulation flag because engineInfo.supports_fp16 is false. I suppose the cl_khr_fp16 extension should be enabled as the result, but I cannot find any code that does so or even checks for this flag down the line. If I comment out this EnableFP16Emulation, then opencl says obviously that "declaring variable of type 'half' is not allowed". I'm currently investigating this, didn't do anything here last week tbh.
The first observation is that, there are some wrong values in ymm register for input values of Equal op: m256_f32 = {1.00000000, 1.00000000, 1.00000000, nan, nan, nan, 65504.0000, 65504.0000} vs weights that have correct values: m256_f32 = {nan, inf, -inf, nan, inf, -inf, inf, -inf}
Maybe the reason is the conversion op But they both go through conversion node...
UPD. So, in case of CPU, the weights go as constant nodes and are loaded correctly, but a conversion node for input uses
static_cast<dst_t>(std::max(std::min(tmp[j], ubound), lbound));where ubound and lbound are numeric_limits::max and numeric_limits::lowest respectively that are pos and neg finite values and thus this conversion clamps infinities to finite valuesThis clamping apparently makes sense only for integral dst types, if the dst type is one of the floating point types than just a static_cast can be used
@DannyVlasenko , thank you for discovering the root cause of the test failure when it's run on the CPU plugin. I would rather agree that such clamping doesn't really reproduce the intermediate conversion behavior for floating point types. Thus, at least for the template specializations for ov::float16 type it does make sense to remove this clamping, just like it's been done for bfloat16.
The first observation is that, there are some wrong values in ymm register for input values of Equal op: m256_f32 = {1.00000000, 1.00000000, 1.00000000, nan, nan, nan, 65504.0000, 65504.0000} vs weights that have correct values: m256_f32 = {nan, inf, -inf, nan, inf, -inf, inf, -inf} Maybe the reason is the conversion op But they both go through conversion node... UPD. So, in case of CPU, the weights go as constant nodes and are loaded correctly, but a conversion node for input uses
static_cast<dst_t>(std::max(std::min(tmp[j], ubound), lbound));where ubound and lbound are numeric_limits::max and numeric_limits::lowest respectively that are pos and neg finite values and thus this conversion clamps infinities to finite values This clamping apparently makes sense only for integral dst types, if the dst type is one of the floating point types than just a static_cast can be used@DannyVlasenko , thank you for discovering the root cause of the test failure when it's run on the CPU plugin. I would rather agree that such clamping doesn't really reproduce the intermediate conversion behavior for floating point types. Thus, at least for the template specializations for
ov::float16type it does make sense to remove this clamping, just like it's been done forbfloat16.
@maxnick Should I make a separate PR for CPU case? Because GPU and CPU cases seem to be unrelated and I'm still thinking how to solve GPU case
@maxnick Should I make a separate PR for CPU case? Because GPU and CPU cases seem to be unrelated and I'm still thinking how to solve GPU case
@DannyVlasenko , yes, definitely, there is no point of making joint PR for GPU and CPU.