xla
xla copied to clipboard
A machine learning compiler for GPUs, CPUs, and ML accelerators
refactor OSS Trace proto into Trace metadata and a TraceContainer. This CL introduces a OSS TraceContainer type that is very similar to the internal TraceEventsContainer. End goal is to either...
Enroll EagerOperations to DTensor caching. This works around a leak problem when the same Eager operation is executed multiple times. A more proper fix is to register a notifier_fn when...
[LatencyHidingScheduler] Add ProfileGuidedLatencyEstimator.
Add `-windows_excluded` to TF build/test tag filters Currently, `no_windows` is used to exclude a test from running in the windows environment. However, it is difficult to distinguish between temporary and...
[LatencyHidingScheduler] Add an option to place host send and send-done as early in the schedule as posssible. Controlled by the enable_send_recv_post_process_scheduling scheduler config option.
Merge C++ and Mesh implementation of most Mesh methods. C++ becomes the source of truth for Mesh. changes are in layout.py, tensor_layout.[cc|h], and the pywrap_dtensor_device.cc file. Many attribute methods are...
[LatencyHidingScheduler] Add optional pass that moves host send-done to just before the following send.
Better flag alignment for JIT/AOT paths. - Hook up the flag for the new deallocator on the JIT path. - Hook up the flag for dumping snapshots on the AOT...
PR #59936: [NVIDIA XLA] Disable TF32 evaluation for SelfAdjointEigTest cases. Imported from GitHub PR https://github.com/tensorflow/tensorflow/pull/59936 When run on Ampere and newer GPUs some of the self adjoint eigenvalue test cases...