xla
xla copied to clipboard
[XLA:GPU] Cleanup: Use the new calling convention for non-fusion operations too and remove the old one
trafficstars
[XLA:GPU] Cleanup: Use the new calling convention for non-fusion operations too and remove the old one
Reminder about the new calling convention:
- We pass all arguments / output buffers separately, not as a single temp_buff. Constant arguments are also passed as arguments to the kernel.
- This means that kernels don't have hardcoded offsets to their argument buffers, or references to the constant arguments, they simply receive direct pointers to them.
- If a kernel call has multiple parameters which are the same pointer, they are only passed once - to enable better optimizations.
This does not change kernel reuse:
- Fusion kernels are still reused.
- Non-fusion kernels are still not reused.
This CL shouldn't have a considerable runtime performance or compilation time effect. Tested it with internal benchmarks.
Test: Fixed HLO tests and added new calling_convention.hlo test.