tfjs
tfjs copied to clipboard
webgl: Fix NaN issue
Fix #6822
Problem
1: On some GPUs, even if a
and b
are both non-NaN, the value of isNaN
in vec4 isNaN = min(vec4(isnan(a)) + vec4(isnan(b)), vec4(1.0));
are still larger than 0.
, which misleads all values become NAN
.
2: After resolving NAN
issue, the result is still incorrect. It seems that the isnan_custom
is not well supported on the problem GPU. After switching back to builtin isnan
, everything works well.
Solution:
Use the bool type bvec4
instead of float type vec4
to calculate isNaN
to avoid the the float precision issue when comparing with zero.
Meanwhile, add an env flag WEBGL_ISNAN_CUSTOM
to allow user to specify which isnan
to use.
To see the logs from the Cloud Build CI, please join either our discussion or announcement mailing list.
@vladmandic @shurshilov Please have the last test with this PR. I also make it ready for review. The verified solution https://github.com/tensorflow/tfjs/issues/6822#issuecomment-1247460709:
vec4 nanValue = a;
bvec4 isNaN = isnan(a);
result.r = isNaN.r ? nanValue.r : result.r;
result.g = isNaN.g ? nanValue.g : result.g;
result.b = isNaN.b ? nanValue.b : result.b;
result.a = isNaN.a ? nanValue.a : result.a;
nanValue = b;
isNaN = isnan(b);
result.r = isNaN.r ? nanValue.r : result.r;
result.g = isNaN.g ? nanValue.g : result.g;
result.b = isNaN.b ? nanValue.b : result.b;
result.a = isNaN.a ? nanValue.a : result.a;
return result;
The solution in this PR:
bvec4 isNaNA = isnan(a);
bvec4 isNaNB = isnan(b);
bvec4 isNaN = bvec4(isNaNA.x || isNaNB.x, isNaNA.y || isNaNB.y, isNaNA.z || isNaNB.z, isNaNA.w || isNaNB.w);
result.r = isNaN.r ? NAN : result.r;
result.g = isNaN.g ? NAN : result.g;
result.b = isNaN.b ? NAN : result.b;
result.a = isNaN.a ? NAN : result.a;
return result;
I think the second one should also be workable. But if not, I need to revert this change Use NAN instead of nanValue.xxx
Kindly ping you guys. Also add @Linchenn in case you are in vacation.
should this default to true to match the previous behavior?
if previous behavior is proven to be broken on some gpus, why?
afaik, only reason for isnan_custom
is because webgl1 does not have native isnan
, so why not enable it for webgl1 and disable otherwise. we should avoid creating new flags that nobody will remember few months from now.
@qjia7 It sounds the correctness of the builtin/polyfilled isnan
can be programmatically probed with a small snippet and then the right one could be picked up automatically without "creating new flags that nobody will remember few months from now", or no?
@qjia7 It sounds the correctness of the builtin/polyfilled
isnan
can be programmatically probed with a small snippet and then the right one could be picked up automatically without "creating new flags that nobody will remember few months from now", or no?
In fact, It's hard to construct a small case to reflect the isnan
issue. The current solution is to always use the builtin isnan
. I know you may have concern about the comment here Use custom isnan definition to work across differences between implementations on various platforms.
. But I think it's compare to the old custom isnan
[return (val > 0. || val < 1. || val == 0.) ? false : true;
]. So I prefer we directly use the builtin isnan for simplification.