Ilia Sergachev comments

Results 36 comments of


                                            Ilia Sergachev

Add cudnn_fusion decorator lowering computations to XLA cuDNN fusions.

It should run on H100. Is this https://github.com/google/jax/pull/22699/files#diff-77b54950a53c3196a56e8f570cb6dcd4eca602b5a8b4220f5cd2acb86f060e7fR1548 not sufficient to filter by GPU type?

Add cudnn_fusion decorator lowering computations to XLA cuDNN fusions.

Anyway, I looked at other tests and added a check with skipTest(). It actually works on Ampere+.

Add cudnn_fusion decorator lowering computations to XLA cuDNN fusions.

I tested it on A100.

Add cudnn_fusion decorator lowering computations to XLA cuDNN fusions.

Indeed, I examined the tests we have (https://github.com/openxla/xla/blob/main/xla/service/gpu/transforms/cudnn_custom_call_converter_test.cc#L27, https://github.com/openxla/xla/blob/main/xla/service/gpu/autotuning/gemm_fusion_autotuner_test.cc#L709) and realised, that the latter one relies on xla_gpu_cublas_fallback(false). Fix: https://github.com/google/jax/pull/23505

Allow fusing epilogues whose operands are broadcast of effective-scalar instructions.

> AMD ROCm fails due to merge conflict: > Any suggestions on how to proceed? It fails for other PRs too, for example https://github.com/openxla/xla/pull/15930, and is non-blocking.

ValueError: Cannot extract CSR name from code, need to specify.

I haven't been using and updating this repository for a while indeed but it works for me as is with a clean clone with all the registered git submodule versions:...