pjannaty comments

Results 6 comments of


                                            pjannaty

[NVIDIA TF] Revert TF32+NHWC changes

Discussed offline the alternative approach of only reverting [55920](https://github.com/tensorflow/tensorflow/pull/55920), but decided to revert all three PRs to disallow the default NHWC usage for fp32/tf32 no matter in graph level ([55761](https://github.com/tensorflow/tensorflow/pull/55761))...

[NVIDIA TF] Part 2: Fused matmul op supports cudnn matmul fusion

Can we change the base from master to Part1? Once Part1 gets merged we can rebase and change the base back to master.

[NVIDIA TF] Part 1: Stream executor supports cudnn matmul fusion

Can we also hyper-link the descendent PRs? i.e. [Part 2: Fused matmul op supports cudnn matmul fusion.](https://github.com/tensorflow/tensorflow/pull/56826) and in the other PRs as well for ease of navigating.

[XLA] Cublaslt fp8 matmul restriction work-around

> @reedwm This is largely independent of the requirement of transposing A but not B. The column-major/row-major layout returned by GemmConfig::For doesn’t fully describe the configuration of a GEMM. Not...

[XLA] Cublaslt fp8 matmul restriction work-around

Eventually can we please streamline and shorten the tests by perhaps parametrizing them? This would also help with readability. https://google.github.io/googletest/reference/testing.html#TEST_P

No improvement in GPU memory consumption during inference

It's hard to tell why TRT does not show memory usage reduction here. We do have an experimental PR that you may want to use at your discretion to see...