Enabling shape uniforms gives incorrect output with MatMulPackedProgram
System information
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow.js): Yes
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): MacOS Monterey (12.4)
- Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: Reproduces on both desktop and mobile
- TensorFlow.js installed from (npm or script link): built from source
- TensorFlow.js version (use command below): 3.16.0
- Browser version: Chrome 102.0.5005.61
- Tensorflow.js Converter Version: N/A
Describe the current behavior
- https://github.com/tensorflow/tfjs/pull/5502 fixed some issues with providing shapes as uniforms.
- However, we are still getting incorrect outputs for
MatMulPackedProgramwhen settingWEBGL_USE_SHAPES_UNIFORMSto true. - Upon investigation, we are getting the same shader key for 2 shaders with different shader code. The shader code differs in this if statement https://github.com/tensorflow/tfjs/blob/8c7fd026bb9940c926a94f70d7bee5ef1f51a1ef/tfjs-backend-webgl/src/shader_compiler.ts#L1058
- In one shader the if path is taken (ie
texShape != null && util.arraysEqual(shape, texShape) == true) while it is not taken in the other shader. So far, we have only reproduced this forMatMulPackedProgram. MatMulPackedProgramtakes a 3d input (batch, dim0, dim1) which is then "squeezed down" to 2d when batch == 1. In https://github.com/tensorflow/tfjs/blob/8c7fd026bb9940c926a94f70d7bee5ef1f51a1ef/tfjs-backend-webgl/src/gpgpu_math.ts#L430,x.shapeis 3-dimensional, whilexTexShapeis 2-dimensional. Hence,isLogicalShapTexShapeEqualis always false forMatMulPackedProgrameven if the input shape and texShape match exactly after dropping the first (batch) dimension.- As a result, if we have 2
MatMulPackedPrograms , where all the parameters for shader key generation match except forisLogicalShapTexShapeEqual, the programs point to the same compiled shader instead of 2 separate shaders. Based on which shader is compiled first, the other produces incorrect outputs.
Standalone code to reproduce the issue
- I haven't been able to reproduce this in an existing open-source model. It only reproduces in our internal model.
Possible fix
- In https://github.com/tensorflow/tfjs/blob/8c7fd026bb9940c926a94f70d7bee5ef1f51a1ef/tfjs-backend-webgl/src/gpgpu_math.ts#L430, changing
util.arraysEqual(x.shape, xTexShape);toutil.arraysEqual(uniformShape, xTexShape);fixes the issue. And if I'm understanding the code correctly, we should useuniformShapein place ofx.shapethroughout thatifblock.
Thank you shanumantesc for the detailed investigation! Let me check this.
@Linchenn did you get a chance to look into this further?
Sorry for the delay. I will check it soon and will share you the updates when I have.
@shanumantesc Could you provide the codes that returns wrong results? I just tried the following codes but the results are correct:
const a = tf.tensor3d([1, 2, 3, 4, 5, 6], [1, 2, 3]);
const b = tf.tensor3d([0, 1, -3, 2, 2, 1], [1, 3, 2]);
tf.env().set('WEBGL_USE_SHAPES_UNIFORMS', false);
let c = tf.matMul(a, b);
expect(c.shape).toEqual([1, 2, 2]);
expectArraysClose(await c.data(), [0, 8, -3, 20]);
tf.env().set('WEBGL_USE_SHAPES_UNIFORMS', true);
c = tf.matMul(a, b);
expect(c.shape).toEqual([1, 2, 2]);
expectArraysClose(await c.data(), [0, 8, -3, 20]);
About the cache key you mentioned, the two MatMul shaders use different cache keys:
- the first one (
WEBGL_USE_SHAPES_UNIFORMSas false) uses:
MatMulPackedProgram_1,2,3_2,4_false1,3,2_4,2_false1,2,2_2,2_false_
// Don't use uniform for sharedDimensionPacked for performance.
const float sharedDimension = 2.0;
vec4 dot2x2ARowBCol(ivec3 rc) {
vec4 result = vec4(0);
for (int i = 0; i < 2; i++) {
int batchA = rc.x;
int batchB = rc.x;
vec4 a = getMatrixA(batchA, rc.y, i * 2);
vec4 b = getMatrixB(batchB, i * 2, rc.z);
// These swizzled products need to be separately added.
// See: https://github.com/tensorflow/tfjs/issues/1735
result += (a.xxzz * b.xyxy);
result += (a.yyww * b.zwzw);
}
return result;
}
void main() {
ivec3 rc = getOutputCoords();
vec4 result = dot2x2ARowBCol(rc);
setOutput(result);
}
2
- the second one (
WEBGL_USE_SHAPES_UNIFORMSas true) uses:
MatMulPackedProgram_3_false_1,2_2_false__false_____false3_false_1,2_2_false__false_____false3_false_1,2_2_false__false_____false_
// Don't use uniform for sharedDimensionPacked for performance.
const float sharedDimension = 2.0;
vec4 dot2x2ARowBCol(ivec3 rc) {
vec4 result = vec4(0);
for (int i = 0; i < 2; i++) {
int batchA = rc.x;
int batchB = rc.x;
vec4 a = getMatrixA(batchA, rc.y, i * 2);
vec4 b = getMatrixB(batchB, i * 2, rc.z);
// These swizzled products need to be separately added.
// See: https://github.com/tensorflow/tfjs/issues/1735
result += (a.xxzz * b.xyxy);
result += (a.yyww * b.zwzw);
}
return result;
}
void main() {
ivec3 rc = getOutputCoords();
vec4 result = dot2x2ARowBCol(rc);
setOutput(result);
}
2
From my understanding, the shader compiler here uses tensor's logic shape (un-squeezed shape): https://github.com/tensorflow/tfjs/blob/8c7fd026bb9940c926a94f70d7bee5ef1f51a1ef/tfjs-backend-webgl/src/shader_compiler.ts#L1058
As the result, the cache key construction here are also supposed to use tensor's logic shape (un-squeezed shape), instead of the uniform shape (squeezed shape): https://github.com/tensorflow/tfjs/blob/8c7fd026bb9940c926a94f70d7bee5ef1f51a1ef/tfjs-backend-webgl/src/gpgpu_math.ts#L430-L431
@Linchenn I don't immediately have a way way to reproduce, but I'll try to see if I can generate some dummy shapes where I can reproduce the issue.
In the meanwhile, I think using logical shape (un-squeezed shape) in isLogicalShapTexShapeEqual breaks for the following reason:
texShape in https://github.com/tensorflow/tfjs/blob/8c7fd026bb9940c926a94f70d7bee5ef1f51a1ef/tfjs-backend-webgl/src/shader_compiler.ts#L1058 is 2d.
For PackedMatMulProgram, we end up calling https://github.com/tensorflow/tfjs/blob/8c7fd026bb9940c926a94f70d7bee5ef1f51a1ef/tfjs-backend-webgl/src/shader_compiler.ts#L1228 because the logical shape is 3D with batch size = 1.
Now, in https://github.com/tensorflow/tfjs/blob/8c7fd026bb9940c926a94f70d7bee5ef1f51a1ef/tfjs-backend-webgl/src/shader_compiler.ts#L1230 the logical shape gets squeezed to 2D and we call getPackedSampler2D with a 2D logical shape.
And this is where the discrepancy arises because if the squeezed logical and texture shapes match, the shader key would still have the value as false, and so an incorrect shader is picked.
To reproduce this issue, we need to find inputs where logical shape and texture shape match exactly for matmul. The logic to decide the physical shape seems a bit complicated when I read it, so it isn't straightforward to produce a failure case.
@Linchenn thanks to your code snippet above, I found the following matmul shapes from our network where the output doesn't match with shape uniforms turned on and off:
- 102400x96 ; 96x4
- 102400x96 ; 96x2
shape0 = [1, 102400, 96];
shape1 = [1, 96, 94]
const a = tf.randomNormal(shape0);
const b = tf.randomNormal(shape1);
tf.env().set('WEBGL_USE_SHAPES_UNIFORMS', false);
let c0 = tf.matMul(a, b);
tf.env().set('WEBGL_USE_SHAPES_UNIFORMS', true);
let c1 = tf.matMul(a, b);
if (tf.any(tf.greater(tf.abs(c0.sub(c1)), tf.scalar(1e-2))).dataSync()[0]) {
console.log("Failed");
}
Hi @Linchenn , just following up on this one if you were able to reproduce the issue with the above steps. And if yes, do you think changing util.arraysEqual(x.shape, xTexShape); to util.arraysEqual(uniformShape, xTexShape); is the correct fix?
@shanumantesc Sorry, I could not reproduce this:

Did I miss something?
@Linchenn you're right, this doesn't reproduce for me either 😕 I wonder if I was on some incorrect TFJS version, but will try to reproduce it locally once again and get back to you.
@shanumantesc If you find the error again, feel free to re-open this.