DirectXShaderCompiler icon indicating copy to clipboard operation
DirectXShaderCompiler copied to clipboard

More aggressive reassociations

Open lizhengxing opened this issue 1 year ago • 3 comments

Although DXC applied the upstream change, Reassociate: add global reassociation algorithm (https://github.com/llvm/llvm-project/commit/b8a330c) in this PR (https://github.com/microsoft/DirectXShaderCompiler/pull/6598), it still might overlook some obvious common factors.

One case has been observed is:

  %Float4_0   = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 1)
  %Float4_0.w = extractvalue %dx.types.CBufRet.f32 %Float4_0, 3
  %Float2_0   = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 0)
  %Float2_0.y = extractvalue %dx.types.CBufRet.f32 %Float2_0, 1

  /* %Float4_1 is redundant with %Float4_0 since they invokes cbufferLoadLegacy with same parameters */
  %Float4_1   = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 1)

  /* %Float4_1.w is redundant with %Float4_0.w */
  %Float4_1.w = extractvalue %dx.types.CBufRet.f32 %Float4_1, 3 

  /* %Float2_1 is redundant with %Float2_0 since they invokes cbufferLoadLegacy with same parameters */
  %Float2_1   = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 0)

  /* %Float2_1.y is redundant with %Float2_0.y */
  %Float2_1.y = extractvalue %dx.types.CBufRet.f32 %Float2_1, 1

  ....
  %11 = fmul fast float %Float4_0.w, %10
  %12 = fmul fast float %11, %Float2_0.y
  ....
  %14 = fmul fast float %Float4_1.w, %13
  %15 = fmul fast float %14, %Float2_1.y 

(%Float4_0.w * %Float2_0.y) equals to (%Float4_1.w * %Float2_1.y), they should be reassociated to a common factor

The upstream change can't identify this common factor because DXC doesn't know (%Float4_0.w, %Float4_1.w) and (%Float2_0.y, %Float2_1.y) are redundant when running Reassociate pass. Those redundancies will be eliminated in GVN pass.

For DXC can identify more common factors, this PR will aggressively run Reassociate pass again after GVN pass and then run GVN pass again to remove the redundancies generared in this run of Reassociate pass.

Changing the order of floating point operations causes the precision issue. In case some shaders get unexpected results due to this PR, use "-opt-disable aggressive-reassociation" to disable this PR and roll back.

This is part 3 of the fix for https://github.com/microsoft/DirectXShaderCompiler/issues/6593.

lizhengxing avatar May 15 '24 18:05 lizhengxing

In your message you say

DXC doesn't know (%3, %7) and (%7, %9) are redundant

How are %7 and %9 redundant here? Should it be %5 and %9?

It would be easier to read this if you gave them names I think.

dmpots avatar May 15 '24 19:05 dmpots

In your message you say

DXC doesn't know (%3, %7) and (%7, %9) are redundant

How are %7 and %9 redundant here? Should it be %5 and %9?

It would be easier to read this if you gave them names I think.

Yes, %5 and %9 are redundant. I simplified the dxils and renamed the virtual register number. I'll update the description.

lizhengxing avatar May 16 '24 16:05 lizhengxing

In your message you say

DXC doesn't know (%3, %7) and (%7, %9) are redundant

How are %7 and %9 redundant here? Should it be %5 and %9?

It would be easier to read this if you gave them names I think.

Done.

lizhengxing avatar May 16 '24 22:05 lizhengxing

We should look at the performance impact of this change before committing it.

Overall, compared to the upstream change (https://github.com/microsoft/DirectXShaderCompiler/pull/6598), this change can reduce ALU in ~37% of shaders. The occupancy of a small number of shaders are improved (~1%). There were an equal number of regressions and improvements.

Overall, this change looks to be positive.

lizhengxing avatar May 21 '24 17:05 lizhengxing