xla [torchbench] `cm3leon_generate` inference running significantly slower than inductor.

[torchbench] `cm3leon_generate` inference running significantly slower than inductor.

Open ysiraichi opened this issue 1 year ago • 1 comments

🐛 Bug

python xla/benchmarks/experiment_runner.py \
    --suite-name torchbench --accelerator cuda \
    --xla PJRT --dynamo None --test eval \
    --no-resume --print-subprocess \
    -k hf_GPT2

	compilation time (s)	iteration time (s)
cm3leon_generate	868 (850)	~19 (1.65)

(inductor time is shown in parenthesis)

Environment

Reproducible on XLA backend [CPU/TPU/CUDA]: CUDA
torch_xla version: a5692c206

cc @miladm

Feb 15 '24 12:02 ysiraichi

Here's an odd thing I've noticed: there's some compilation taking place in the second iteration:

iteration	time (s)	CompileTime number
1	887	1526
2	200	415
3	19	0

After the second iteration, I observe no more compilation. Inductor, in contrast, seems not to present this issue. i.e. its second iteration timing looks identical to the rest.

Feb 15 '24 12:02 ysiraichi

xla xla copied to clipboard

[torchbench] `cm3leon_generate` inference running significantly slower than inductor.

🐛 Bug

Environment

xla
xla copied to clipboard