xla
xla copied to clipboard
A machine learning compiler for GPUs, CPUs, and ML accelerators
Automated Code Change
Automated Code Change
Automated Code Change
Automated Code Change
To enable ReLU epilogue fusion for CublasLt matmul for training, 2 pair of epilogues: (RELU_AUX, DRELU) and (BIAS_RELU_AUX, DRELU_BGRAD) are added. The RELU_AUX(or BIAS_RELU_AUX) epilogue for the forward matmul outputs...
Check if upgrading to 1.0.0 breaks anything
XLA - Remove CreateToken, integ
Sink broadcast(constant) into while body. It is possible to sink the initialization broadcast into the while body and replace it with a free allocate-buffer custom-call if the entire shape of...
[xla:cpu] Don't forget to release SimpleOrcJit resources after done with compiling
Handle multiple users in all-gather dynamic-slice cancellation. Add CancelAllGatherDynamicSlice pass
I have found in some models that have poor SPMD partitioning the below pattern. ``` all-gather.1 = all-gather(x) dot.1 = dot(all-gather.1, y) dynamic-slice.1 = dynamic-slice(all-gather.1) // can be cancelled ```...