xla
xla copied to clipboard
A machine learning compiler for GPUs, CPUs, and ML accelerators
Make TpuExecutor use StreamExecutorInterface to create Events.
[XLA] [NFC] Serialize all autotuning results The previous logic for filtering by-module wasn't correct, as instructions could be modified after autotuning, resuling in not all relevant information serialized. This could...
It's more granular than the existing --xla_gpu_deterministic_ops because it allows doing an autotuning compilation with non-deterministic ops disabled. --xla_gpu_deterministic_ops is a superset of --xla_gpu_exclude_nondeterministic_ops, so --xla_gpu_deterministic_ops=true will be setting --xla_gpu_exclude_nondeterministic_ops=true...
Remove `TARGET_FILTER` from build script, tag tests that should be filtered
Use StreamExecutorInterface::CreateEvent in event_pool.cc.
Add option to XLA to enforce inlining before llvm splitModule or set preserveLocals=False to get more balanced splits in parallel compilation case. Some data of GPT3 5B model with different...
Use absl::Status instead of xla::Status now that they're identical.
Propagate error to output if the input buffer has error.