xla icon indicating copy to clipboard operation
xla copied to clipboard

A machine learning compiler for GPUs, CPUs, and ML accelerators

Results 653 xla issues
Sort by recently updated
recently updated
newest added

Enable effective scalar dynamic slice fuse into DUS.

For current fp8 gemm, we set the c_scale to one, though it is effectively never used. Newer cublaslt, however, has a stricter requirement that c_scale can be set only when...

Allow fusing epilogues whose operands are broadcast of effective-scalar instructions. This enables creating fusions for fp8 where the pattern is `mul(dot, scalar_ops)` where scalar ops's shapes are either [] or...

When using `--xla_gpu_enable_nccl_comm_splitting=true`, it is possible for a deadlock to occur if one or more subgroups of a split was already created and those devices reuse it from the clique...

Add a version of CreateBuffersForAsyncHostToDevice that takes a custom layout.

Add utility function for determining collectives that are not inside custom fusions.

The added structure Result will be used to add support of slicing.

In `bazel_query.yml` instead query for `deps(//xla/...)` Consistent with https://github.com/tensorflow/tensorflow/blob/master/ci/official/utilities/code_check_full.bats#L312