xla
xla copied to clipboard
A machine learning compiler for GPUs, CPUs, and ML accelerators
Run mobilenet_v2 HLO benchmark on CPU in A/B diff script.
Add PJRT TPU aot device_attributes support to PjRtDeviceTopology. Also adds hidden APIs that allow compiling AOT.
[MemorySpaceAssignment] Add a flag to tune async copy start locations for operands greater than a given size.
Remove Execute and ExecuteWithToken from py_executable.cc
[XLA:GPU] Remove (now unnecessary) Triton-specific kernel reuse Now we have general fusion kernel reuse, so the Triton specific reuse is not needed anymore. This change should not have any runtime...
real fix for b/273369126 . (also resolves b/273583026) removes the `YieldUnsafeUnsortedEvents()` API, removes `MaybeDropEventsForTraceViewer`, and distribute this logic between `TraceContainer::EventSlice` and the JSON serializer. `TraceContainer` is now a safe ARC...
[xla:runtime] Add support of passing async values to runtime executable This change adds the support of passing async values to runtime executable, ex., ``` async.func @test(%arg0: !async.value, %arg1: i32) ->...
Make *_hdrs targets only depends on headers This prevents API users from accidentially compiling in the implementations.
Move //third_party/tensorflow/compiler/xla/service:hlo_{cost_analysis, creation_utils, query} and tests to //third_party/tensorflow/compiler/xla/hlo/utils and update all users.
Improve handling of dynamic shapes in jax native serialization