xla
xla copied to clipboard
[XLA:GPU] Remove (now unnecessary) Triton-specific kernel reuse
trafficstars
[XLA:GPU] Remove (now unnecessary) Triton-specific kernel reuse
Now we have general fusion kernel reuse, so the Triton specific reuse is not needed anymore.
This change should not have any runtime performance impact.
In theory the compile time could become a bit slower, because the reuse-criteria of the generic kernel reuse is less permissive (generates different kernels for different alignments for example). But our tests didn't show a noticeable slowdown.