xla
xla copied to clipboard
Allow fusing epilogues whose operands are broadcast of effective-scalar instructions.
Allow fusing epilogues whose operands are broadcast of effective-scalar instructions. This enables creating fusions for fp8 where the pattern is mul(dot, scalar_ops) where scalar ops's shapes are either [] or [1]. This only affects epilogues, the operands of broadcast will still follow the existing fusing rules. Both triton and cuDNN backends support this kind of fusion.
cc @sergachev