xla icon indicating copy to clipboard operation
xla copied to clipboard

[torchbench] `hf_T5_generate` inference running significantly slower than inductor.

Open ysiraichi opened this issue 1 year ago • 1 comments

🐛 Bug

python xla/benchmarks/experiment_runner.py \
    --suite-name torchbench --accelerator cuda \
    --xla PJRT --dynamo None --test eval \
    --no-resume --print-subprocess \
    -k hf_GPT2
compilation time (s) iteration time (s)
hf_T5_generate 396 (430) ~40 (4)

(inductor time is shown in parenthesis)

Environment

  • Reproducible on XLA backend [CPU/TPU/CUDA]: CUDA
  • torch_xla version: a5692c206

cc @miladm

ysiraichi avatar Feb 15 '24 12:02 ysiraichi

Here's an odd thing I've noticed: there's some compilation taking place in the second iteration:

iteration time (s) CompileTime number
1 440 1281
2 70 202
3 40 0

After the second iteration, I observe no more compilation. Inductor, in contrast, seems not to present this issue. i.e. its second iteration timing looks identical to the rest.

ysiraichi avatar Feb 15 '24 12:02 ysiraichi