tensorflow icon indicating copy to clipboard operation
tensorflow copied to clipboard

[NVIDIA TF] Part 1: Stream executor supports cudnn matmul fusion

Open kaixih opened this issue 3 years ago • 3 comments

This PR enables the cudnn matmul fusion backend for supporting the generic matmul fusion patterns. Specifically, this PR focuses on the matmul+bias+gelu_exact pattern. (Note, the matmul+bias+gelu_approximate has already been supported by cublasLt backend. See https://github.com/tensorflow/tensorflow/pull/55966)

Part 1: Stream executor supports cudnn matmul fusion. (This one) Part 2: Fused matmul op supports cudnn matmul fusion. Part 3: Grappler graph pass supports matmul+bias+gelu_exact.

cc. @nluehr @pjannaty

kaixih avatar Jul 19 '22 22:07 kaixih

Can we also hyper-link the descendent PRs? i.e. Part 2: Fused matmul op supports cudnn matmul fusion. and in the other PRs as well for ease of navigating.

pjannaty avatar Jul 20 '22 17:07 pjannaty

cc @benbarsdell

kaixih avatar Jul 25 '22 22:07 kaixih

Rebased and marked as "Ready to review".

kaixih avatar Jul 27 '22 17:07 kaixih

The rebase is finished. @ezhulenev and @reedwm to review. Thx.

kaixih avatar Aug 18 '22 20:08 kaixih

Since the stream executor is being moved out, I have to rebase the PRs more frequently than before. The rebase is done. @ezhulenev and @reedwm to review. Thx.

kaixih avatar Aug 25 '22 00:08 kaixih