Trevor Morris
Trevor Morris
Fixes #56630 Commit https://github.com/tensorflow/tensorflow/commit/ba57ae7f24743e684accef3521485a24c1235186 introduced two bugs into the build for cublas: 1. It referenced an API `cublasGetStatusString` which was not yet in the cublas stubs. I regenerated the stubs...
With TensorFlow 2.5, there have been some code changes which require CUDA 11. You can read more about the issues when trying to build TF 2.5 against CUDA 10 here:...
### Issue type Bug ### Have you reproduced the bug with TensorFlow Nightly? Yes ### Source source ### TensorFlow version TF 2.15 ### Custom code No ### Current behavior? Previously,...
Extend to another number of channels by duplicating the kernel N times instead of 3
This PR adds multinode support to the multihost_hlo_runner. To use, there is a new command line argument `address` for the address of the coordinator/root node. Example usage with SLURM: ```...
When using `--xla_gpu_enable_nccl_comm_splitting=true`, it is possible for a deadlock to occur if one or more subgroups of a split was already created and those devices reuse it from the clique...
Adds python bindings for `xla_gpu_kernel_cache_file`, `xla_gpu_enable_llvm_module_compilation_parallelism` and `xla_gpu_per_fusion_autotune_cache_dir`. We would like to add some convenience features to JAX which will enable all caches with one flag/option (will open PR for...
This PR makes it easier to enable all of the caching features in JAX and XLA with a single option. Now, when the JAX persistent cache is enabled (`JAX_COMPILATION_CACHE_DIR`), some...