openvino icon indicating copy to clipboard operation
openvino copied to clipboard

[Good First Issue]: Inconsistent behavior/errors of Equal operation in comparison with TF framework using inf, -inf

Open p-durandin opened this issue 1 year ago • 8 comments
trafficstars

Context

While implementing layer test for Equal operation of TF framework I found an issue with inf and -inf comparison.

As a source data next shapes contents are used:

Values for checking important corner cases for float values

expect: false false false false false false true false true

x_corner = [1. , 1. , 1. , np.nan, np.nan, np.nan , np.inf, np.inf , np.NINF] y_corner = [np.nan, np.inf, np.NINF, np.nan, np.inf, np.NINF, np.inf, np.NINF, np.NINF] Test compares shapes item by item and put result into output shape.

For TEST_DEVICE=CPU plugin make comparison it returns unexpected False for comparison float16 inf==inf and -inf==-inf (TF framework returns True). For float32 and float64 it returns similar result as TF Framework. For TEST_DEVICE=GPU just fails on these cases with an error for all three types.

Test command line: python -m pytest layer_tests\tensorflow_tests\test_tf_Equal.py --use_new_frontend -k float

What needs to be done?

To fix comparing

Example Pull Requests

No response

Resources

Contact points

sshlyapn, p-durandin

Ticket

No response

p-durandin avatar Apr 25 '24 12:04 p-durandin

.take

DannyVlasenko avatar May 03 '24 20:05 DannyVlasenko

Thank you for looking into this issue! Please let us know if you have any questions or require any help.

github-actions[bot] avatar May 03 '24 20:05 github-actions[bot]

The first observation is that, there are some wrong values in ymm register for input values of Equal op: m256_f32 = {1.00000000, 1.00000000, 1.00000000, nan, nan, nan, 65504.0000, 65504.0000} vs weights that have correct values: m256_f32 = {nan, inf, -inf, nan, inf, -inf, inf, -inf}

Maybe the reason is the conversion op But they both go through conversion node...

UPD. So, in case of CPU, the weights go as constant nodes and are loaded correctly, but a conversion node for input uses static_cast<dst_t>(std::max(std::min(tmp[j], ubound), lbound)); where ubound and lbound are numeric_limits::max and numeric_limits::lowest respectively that are pos and neg finite values and thus this conversion clamps infinities to finite values

This clamping apparently makes sense only for integral dst types, if the dst type is one of the floating point types than just a static_cast can be used

DannyVlasenko avatar May 06 '24 22:05 DannyVlasenko

@DannyVlasenko while you wait reply from CPU, have you checked GPU behavior?

p-durandin avatar May 20 '24 13:05 p-durandin

@DannyVlasenko while you wait reply from CPU, have you checked GPU behavior?

Looks like I have a solution for CPU, so I can make a separate PR for it, if it is ok.

For GPU the only findings so far are that the kernel selector sets the EnableFP16Emulation flag because engineInfo.supports_fp16 is false. I suppose the cl_khr_fp16 extension should be enabled as the result, but I cannot find any code that does so or even checks for this flag down the line. If I comment out this EnableFP16Emulation, then opencl says obviously that "declaring variable of type 'half' is not allowed". I'm currently investigating this, didn't do anything here last week tbh.

DannyVlasenko avatar May 20 '24 17:05 DannyVlasenko

The first observation is that, there are some wrong values in ymm register for input values of Equal op: m256_f32 = {1.00000000, 1.00000000, 1.00000000, nan, nan, nan, 65504.0000, 65504.0000} vs weights that have correct values: m256_f32 = {nan, inf, -inf, nan, inf, -inf, inf, -inf}

Maybe the reason is the conversion op But they both go through conversion node...

UPD. So, in case of CPU, the weights go as constant nodes and are loaded correctly, but a conversion node for input uses static_cast<dst_t>(std::max(std::min(tmp[j], ubound), lbound)); where ubound and lbound are numeric_limits::max and numeric_limits::lowest respectively that are pos and neg finite values and thus this conversion clamps infinities to finite values

This clamping apparently makes sense only for integral dst types, if the dst type is one of the floating point types than just a static_cast can be used

@DannyVlasenko , thank you for discovering the root cause of the test failure when it's run on the CPU plugin. I would rather agree that such clamping doesn't really reproduce the intermediate conversion behavior for floating point types. Thus, at least for the template specializations for ov::float16 type it does make sense to remove this clamping, just like it's been done for bfloat16.

maxnick avatar May 22 '24 10:05 maxnick

The first observation is that, there are some wrong values in ymm register for input values of Equal op: m256_f32 = {1.00000000, 1.00000000, 1.00000000, nan, nan, nan, 65504.0000, 65504.0000} vs weights that have correct values: m256_f32 = {nan, inf, -inf, nan, inf, -inf, inf, -inf} Maybe the reason is the conversion op But they both go through conversion node... UPD. So, in case of CPU, the weights go as constant nodes and are loaded correctly, but a conversion node for input uses static_cast<dst_t>(std::max(std::min(tmp[j], ubound), lbound)); where ubound and lbound are numeric_limits::max and numeric_limits::lowest respectively that are pos and neg finite values and thus this conversion clamps infinities to finite values This clamping apparently makes sense only for integral dst types, if the dst type is one of the floating point types than just a static_cast can be used

@DannyVlasenko , thank you for discovering the root cause of the test failure when it's run on the CPU plugin. I would rather agree that such clamping doesn't really reproduce the intermediate conversion behavior for floating point types. Thus, at least for the template specializations for ov::float16 type it does make sense to remove this clamping, just like it's been done for bfloat16.

@maxnick Should I make a separate PR for CPU case? Because GPU and CPU cases seem to be unrelated and I'm still thinking how to solve GPU case

DannyVlasenko avatar May 22 '24 10:05 DannyVlasenko

@maxnick Should I make a separate PR for CPU case? Because GPU and CPU cases seem to be unrelated and I'm still thinking how to solve GPU case

@DannyVlasenko , yes, definitely, there is no point of making joint PR for GPU and CPU.

maxnick avatar May 22 '24 10:05 maxnick