xla
xla copied to clipboard
[torchbench] `hf_T5_generate` inference running significantly slower than inductor.
🐛 Bug
python xla/benchmarks/experiment_runner.py \
--suite-name torchbench --accelerator cuda \
--xla PJRT --dynamo None --test eval \
--no-resume --print-subprocess \
-k hf_GPT2
compilation time (s) | iteration time (s) | |
---|---|---|
hf_T5_generate | 396 (430) | ~40 (4) |
(inductor time is shown in parenthesis)
Environment
- Reproducible on XLA backend [CPU/TPU/CUDA]: CUDA
- torch_xla version: a5692c206
cc @miladm
Here's an odd thing I've noticed: there's some compilation taking place in the second iteration:
iteration | time (s) | CompileTime number |
---|---|---|
1 | 440 | 1281 |
2 | 70 | 202 |
3 | 40 | 0 |
After the second iteration, I observe no more compilation. Inductor, in contrast, seems not to present this issue. i.e. its second iteration timing looks identical to the rest.