tfjs Enabling shape uniforms gives incorrect output with MatMulPackedProgram

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow.js): Yes
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): MacOS Monterey (12.4)
Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: Reproduces on both desktop and mobile
TensorFlow.js installed from (npm or script link): built from source
TensorFlow.js version (use command below): 3.16.0
Browser version: Chrome 102.0.5005.61
Tensorflow.js Converter Version: N/A

Describe the current behavior

https://github.com/tensorflow/tfjs/pull/5502 fixed some issues with providing shapes as uniforms.
However, we are still getting incorrect outputs for MatMulPackedProgram when setting WEBGL_USE_SHAPES_UNIFORMS to true.
Upon investigation, we are getting the same shader key for 2 shaders with different shader code. The shader code differs in this if statement https://github.com/tensorflow/tfjs/blob/8c7fd026bb9940c926a94f70d7bee5ef1f51a1ef/tfjs-backend-webgl/src/shader_compiler.ts#L1058
In one shader the if path is taken (ie texShape != null && util.arraysEqual(shape, texShape) == true) while it is not taken in the other shader. So far, we have only reproduced this for MatMulPackedProgram.
MatMulPackedProgram takes a 3d input (batch, dim0, dim1) which is then "squeezed down" to 2d when batch == 1. In https://github.com/tensorflow/tfjs/blob/8c7fd026bb9940c926a94f70d7bee5ef1f51a1ef/tfjs-backend-webgl/src/gpgpu_math.ts#L430, x.shape is 3-dimensional, while xTexShape is 2-dimensional. Hence, isLogicalShapTexShapeEqual is always false for MatMulPackedProgram even if the input shape and texShape match exactly after dropping the first (batch) dimension.
As a result, if we have 2 MatMulPackedPrograms , where all the parameters for shader key generation match except for isLogicalShapTexShapeEqual , the programs point to the same compiled shader instead of 2 separate shaders. Based on which shader is compiled first, the other produces incorrect outputs.

Standalone code to reproduce the issue

I haven't been able to reproduce this in an existing open-source model. It only reproduces in our internal model.

Possible fix

In https://github.com/tensorflow/tfjs/blob/8c7fd026bb9940c926a94f70d7bee5ef1f51a1ef/tfjs-backend-webgl/src/gpgpu_math.ts#L430, changing util.arraysEqual(x.shape, xTexShape); to util.arraysEqual(uniformShape, xTexShape); fixes the issue. And if I'm understanding the code correctly, we should use uniformShape in place of x.shape throughout that if block.

Jun 01 '22 10:06 shanumante-sc

Thank you shanumantesc for the detailed investigation! Let me check this.

Jun 01 '22 22:06 Linchenn

@Linchenn did you get a chance to look into this further?

Jun 08 '22 07:06 shanumante-sc

Sorry for the delay. I will check it soon and will share you the updates when I have.

Jun 08 '22 07:06 Linchenn

@shanumantesc Could you provide the codes that returns wrong results? I just tried the following codes but the results are correct:

    const a = tf.tensor3d([1, 2, 3, 4, 5, 6], [1, 2, 3]);
    const b = tf.tensor3d([0, 1, -3, 2, 2, 1], [1, 3, 2]);

    tf.env().set('WEBGL_USE_SHAPES_UNIFORMS', false);
    let c = tf.matMul(a, b);
    expect(c.shape).toEqual([1, 2, 2]);
    expectArraysClose(await c.data(), [0, 8, -3, 20]);

    tf.env().set('WEBGL_USE_SHAPES_UNIFORMS', true);
    c = tf.matMul(a, b);
    expect(c.shape).toEqual([1, 2, 2]);
    expectArraysClose(await c.data(), [0, 8, -3, 20]);

About the cache key you mentioned, the two MatMul shaders use different cache keys:

the first one (WEBGL_USE_SHAPES_UNIFORMS as false) uses:

MatMulPackedProgram_1,2,3_2,4_false1,3,2_4,2_false1,2,2_2,2_false_
      
      // Don't use uniform for sharedDimensionPacked for performance.
      const float sharedDimension = 2.0;

      vec4 dot2x2ARowBCol(ivec3 rc) {
        vec4 result = vec4(0);
        for (int i = 0; i < 2; i++) {
          int batchA = rc.x;
          int batchB = rc.x;
          vec4 a = getMatrixA(batchA, rc.y, i * 2);
          vec4 b = getMatrixB(batchB, i * 2, rc.z);

          // These swizzled products need to be separately added.
          // See: https://github.com/tensorflow/tfjs/issues/1735
          result += (a.xxzz * b.xyxy);
          result += (a.yyww * b.zwzw);
        }
        return result;
      }

      void main() {
        ivec3 rc = getOutputCoords();
        vec4 result = dot2x2ARowBCol(rc);

        

        

        setOutput(result);
      }
    2

the second one (WEBGL_USE_SHAPES_UNIFORMS as true) uses:

MatMulPackedProgram_3_false_1,2_2_false__false_____false3_false_1,2_2_false__false_____false3_false_1,2_2_false__false_____false_
      
      // Don't use uniform for sharedDimensionPacked for performance.
      const float sharedDimension = 2.0;

      vec4 dot2x2ARowBCol(ivec3 rc) {
        vec4 result = vec4(0);
        for (int i = 0; i < 2; i++) {
          int batchA = rc.x;
          int batchB = rc.x;
          vec4 a = getMatrixA(batchA, rc.y, i * 2);
          vec4 b = getMatrixB(batchB, i * 2, rc.z);

          // These swizzled products need to be separately added.
          // See: https://github.com/tensorflow/tfjs/issues/1735
          result += (a.xxzz * b.xyxy);
          result += (a.yyww * b.zwzw);
        }
        return result;
      }

      void main() {
        ivec3 rc = getOutputCoords();
        vec4 result = dot2x2ARowBCol(rc);

        

        

        setOutput(result);
      }
    2

Aug 01 '22 00:08 Linchenn

From my understanding, the shader compiler here uses tensor's logic shape (un-squeezed shape): https://github.com/tensorflow/tfjs/blob/8c7fd026bb9940c926a94f70d7bee5ef1f51a1ef/tfjs-backend-webgl/src/shader_compiler.ts#L1058

As the result, the cache key construction here are also supposed to use tensor's logic shape (un-squeezed shape), instead of the uniform shape (squeezed shape): https://github.com/tensorflow/tfjs/blob/8c7fd026bb9940c926a94f70d7bee5ef1f51a1ef/tfjs-backend-webgl/src/gpgpu_math.ts#L430-L431

Aug 01 '22 00:08 Linchenn

@Linchenn I don't immediately have a way way to reproduce, but I'll try to see if I can generate some dummy shapes where I can reproduce the issue.

In the meanwhile, I think using logical shape (un-squeezed shape) in isLogicalShapTexShapeEqual breaks for the following reason:

texShape in https://github.com/tensorflow/tfjs/blob/8c7fd026bb9940c926a94f70d7bee5ef1f51a1ef/tfjs-backend-webgl/src/shader_compiler.ts#L1058 is 2d.

For PackedMatMulProgram, we end up calling https://github.com/tensorflow/tfjs/blob/8c7fd026bb9940c926a94f70d7bee5ef1f51a1ef/tfjs-backend-webgl/src/shader_compiler.ts#L1228 because the logical shape is 3D with batch size = 1.

Now, in https://github.com/tensorflow/tfjs/blob/8c7fd026bb9940c926a94f70d7bee5ef1f51a1ef/tfjs-backend-webgl/src/shader_compiler.ts#L1230 the logical shape gets squeezed to 2D and we call getPackedSampler2D with a 2D logical shape.

And this is where the discrepancy arises because if the squeezed logical and texture shapes match, the shader key would still have the value as false, and so an incorrect shader is picked.

To reproduce this issue, we need to find inputs where logical shape and texture shape match exactly for matmul. The logic to decide the physical shape seems a bit complicated when I read it, so it isn't straightforward to produce a failure case.

Aug 01 '22 03:08 shanumante-sc

@Linchenn thanks to your code snippet above, I found the following matmul shapes from our network where the output doesn't match with shape uniforms turned on and off:

102400x96 ; 96x4
102400x96 ; 96x2

shape0 = [1, 102400, 96];
shape1 = [1, 96, 94]
const a = tf.randomNormal(shape0);
const b = tf.randomNormal(shape1);

tf.env().set('WEBGL_USE_SHAPES_UNIFORMS', false);
let c0 = tf.matMul(a, b);

tf.env().set('WEBGL_USE_SHAPES_UNIFORMS', true);
let c1 = tf.matMul(a, b);

if (tf.any(tf.greater(tf.abs(c0.sub(c1)), tf.scalar(1e-2))).dataSync()[0]) {
  console.log("Failed");
}

Aug 01 '22 03:08 shanumante-sc

Hi @Linchenn , just following up on this one if you were able to reproduce the issue with the above steps. And if yes, do you think changing util.arraysEqual(x.shape, xTexShape); to util.arraysEqual(uniformShape, xTexShape); is the correct fix?

Aug 05 '22 22:08 shanumante-sc

@shanumantesc Sorry, I could not reproduce this:

Did I miss something?

Aug 31 '22 21:08 Linchenn

@Linchenn you're right, this doesn't reproduce for me either 😕 I wonder if I was on some incorrect TFJS version, but will try to reproduce it locally once again and get back to you.

Aug 31 '22 21:08 shanumante-sc

@shanumantesc If you find the error again, feel free to re-open this.

Oct 05 '22 17:10 Linchenn

Are you satisfied with the resolution of your issue? Yes No

Oct 05 '22 17:10 google-ml-butler[bot]