xla Failing Torchbench Models: tracking issue

Summary of Contributions (9th Feb)

Improve the number of models in TorchBench that work with Dynamo as a tracer: These passing rates are now comparable to those from torch.compile using Inductor. Some of the fixes also improved the previous tracer that PyTorch/XLA used to use.

Inference Training

Inductor 87 63

Dynamo 60 to 82 41 to 53

Non-Dynamo 79 to 82 54 to 56
Improve the benchmarking tools used by Google: The initial Google runs benchmarking these models showed a discrepancy of about 15 models with the results reported. We identified and fixed 10+ issues that helped reconcile Google's benchmarks with those reported and, in turn, with the PyTorch HUD.

	Inference	Training
Inductor	87	63
Dynamo	60 to 82	41 to 53
Non-Dynamo	79 to 82	54 to 56

Current State

This post has two lists:

Failing inference models
Failing training models

Each of them shows the failing models:

Tracing without Dynamo (Eager-mode)
Tracing with Dynamo into openxla (Dynamo+openxla)

These lists were created using the benchmarking scripts that currently live in the upstream. The following command was executed:

python xla/benchmarks/experiment_runner.py \
       --suite-name torchbench \
       --accelerator cuda \
       --xla PJRT --xla None \
       --dynamo openxla --dynamo inductor --dynamo None \
       --test eval --test train \
       --repeat 30 --iterations-per-run 5 \
       --print-subprocess \
       --no-resume

Environment

GPU: A100 40GB

Inference

Non-Dynamo. Pass rate: 87/99 (87%)

[x] DALLE2_pytorch
- Issue: #6010
  - PyTorch/XLA PR: #6060
  - PyTorch/XLA PR: #6071
[ ] cm3leon_generate
- Issue: #6004
[ ] hf_Longformer
- Issue: #5835
[ ] hf_T5_generate
- Issue: #6004
[ ] moco
- Issue: #6083
- Issue: #6011
  - PyTorch/XLA PR: #6296
  - PyTorch/XLA PR: #6076
- Issue: #6010
  - PyTorch/XLA PR: #6060
  - PyTorch/XLA PR: #6071
[x] nvidia_deeprecommender
- Issue: #6006
  - PyTorch/XLA PR: #6296
  - PyTorch/XLA PR: #6076
[x] pytorch_CycleGAN_and_pix2pix
- Issue: #6007
  - PyTorch/XLA PR: #6296
  - PyTorch/XLA PR: #6076
[ ] simple_gpt
- RTX 2060 doesn't support BF16
- Issue: #6011
  - PyTorch/XLA PR: #6296
  - PyTorch/XLA PR: #6076
- Issue: #6010
  - PyTorch/XLA PR: #6060
  - PyTorch/XLA PR: #6071
[ ] simple_gpt_tp_manual
- RTX 2060 doesn't support BF16
- Issue: #6011
  - PyTorch/XLA PR: #6296
  - PyTorch/XLA PR: #6076
- Issue: #6010
  - PyTorch/XLA PR: #6060
  - PyTorch/XLA PR: #6071
[ ] tacotron2
- Issue: #6112
- Issue: #6010
  - PyTorch/XLA PR: #6060
  - PyTorch/XLA PR: #6071
[x] timm_efficientdet
- Issue: #6011
  - PyTorch/XLA PR: #6296
  - PyTorch/XLA PR: #6076
- Issue: #6010
  - PyTorch/XLA PR: #6060
  - PyTorch/XLA PR: #6071
[ ] vision_maskrcnn
- PyTorch/XLA PR: #5743
- PyTorch PR: https://github.com/pytorch/pytorch/pull/112202
- SKIP because of incompatible model and experiment configs

Dynamo+`openxla`. Pass rate: 86/99 (86%)

[x] DALLE2_pytorch
- Issue: #6010
  - PyTorch/XLA PR: #6060
  - PyTorch/XLA PR: #6071
[x] Super_SloMo
- PyTorch/XLA PR: #5707
- PyTorch/benchmark PR: https://github.com/pytorch/benchmark/pull/2038
[ ] cm3leon_generate
- Issue: #5967
[x] detectron2_fasterrcnn_r_101_c4
- Issue: #5966
  - PyTorch/XLA PR: #6170
[x] detectron2_fasterrcnn_r_101_dc5
- Issue: #5966
  - PyTorch/XLA PR: #6170
[x] detectron2_fasterrcnn_r_101_fpn
- Issue: #5966
  - PyTorch/XLA PR: #6170
[x] detectron2_fasterrcnn_r_50_c4
- Issue: #5966
  - PyTorch/XLA PR: #6170
[x] detectron2_fasterrcnn_r_50_dc5
- Issue: #5966
  - PyTorch/XLA PR: #6170
[x] detectron2_fasterrcnn_r_50_fpn
- Issue: #5966
  - PyTorch/XLA PR: #6170
[x] detectron2_fcos_r_50_fpn
- Issue: #5966
  - PyTorch/XLA PR: #6170
[x] detectron2_maskrcnn
- Issue: #5966
  - PyTorch/XLA PR: #6170
[x] detectron2_maskrcnn_r_101_c4
- Issue: #5966
  - PyTorch/XLA PR: #6170
[x] detectron2_maskrcnn_r_101_fpn
- Issue: #5966
  - PyTorch/XLA PR: #6170
[x] detectron2_maskrcnn_r_50_c4
- Issue: #5966
  - PyTorch/XLA PR: #6170
[x] detectron2_maskrcnn_r_50_fpn
- Issue: #5966
  - PyTorch/XLA PR: #6170
[x] dlrm
- PyTorch/XLA PR: #5743
- PyTorch PR: https://github.com/pytorch/pytorch/pull/112202
[x] hf_BigBird
- Issue: #5966
  - PyTorch/XLA PR: #6170
[x] hf_GPT2
- PyTorch/XLA PR: #5922
[x] hf_GPT2_large
- PyTorch/XLA PR: #5922
[ ] hf_Longformer
- Issue: #5835
[ ] hf_Reformer
- Issue: #5837
[ ] hf_T5_generate
- Issue: #5967
[ ] moco
- Issue: #6083
- Issue: #6011
  - PyTorch/XLA PR: #6296
  - PyTorch/XLA PR: #6076
- Issue: #6010
  - PyTorch/XLA PR: #6060
  - PyTorch/XLA PR: #6071
[x] nvidia_deeprecommender
- Issue: #6006
  - PyTorch/XLA PR: #6296
  - PyTorch/XLA PR: #6076
[x] pyhpc_isoneutral_mixing
- PyTorch/XLA PR: #5743
- PyTorch PR: https://github.com/pytorch/pytorch/pull/112202
[x] pyhpc_turbulent_kinetic_energy
- PyTorch/XLA PR: #5743
- PyTorch PR: https://github.com/pytorch/pytorch/pull/112202
[x] pytorch_CycleGAN_and_pix2pix
- Issue: #6007
  - PyTorch/XLA PR: #6296
  - PyTorch/XLA PR: #6076
[x] speech_transformer
- PyTorch/XLA PR: #5823
[x] timm_efficientdet
- Issue: #6011
  - PyTorch/XLA PR: #6296
  - PyTorch/XLA PR: #6076
- Issue: #6010
  - PyTorch/XLA PR: #6060
  - PyTorch/XLA PR: #6071

Models also Failing on Inductor

Inference Failing on Inductor CUDA with the Same Error

Benchmarks that raise the same error on inductor:

[ ] hf_clip
- 'str' object has no attribute 'shape'
[ ] mobilenet_v2_quantized_qat
[ ] resnet50_quantized_qat

Inference Failing on Inductor CUDA with Different Errors

[ ] doctr_det_predictor
- Issue: #6005
[ ] simple_gpt
- RTX 2060 doesn't support BF16
- Issue: #6011
  - PyTorch/XLA PR: #6296
  - PyTorch/XLA PR: #6076
- Issue: #6010
  - PyTorch/XLA PR: #6060
  - PyTorch/XLA PR: #6071
[ ] simple_gpt_tp_manual
- RTX 2060 doesn't support BF16
- Issue: #6011
  - PyTorch/XLA PR: #6296
  - PyTorch/XLA PR: #6076
- Issue: #6010
  - PyTorch/XLA PR: #6060
  - PyTorch/XLA PR: #6071
[ ] tacotron2
- Issue: #6005
- Issue: #6010
  - PyTorch/XLA PR: #6060
  - PyTorch/XLA PR: #6071

Training

Non-Dynamo. Pass rate: 67/99 (67%)

[ ] DALLE2_pytorch
- Issue: #6084
- Issue: #6010
  - PyTorch/XLA PR: #6060
  - PyTorch/XLA PR: #6071
[ ] demucs
- Issue: #6003
[ ] densenet121
- Issue: #6003
[x] detectron2_fasterrcnn_r_101_c4
- Issue: #6004
[x] detectron2_fasterrcnn_r_101_dc5
- Issue: #6004
[x] detectron2_fasterrcnn_r_101_fpn
- Issue: #6004
[x] detectron2_fasterrcnn_r_50_c4
- Issue: #6004
[x] detectron2_fasterrcnn_r_50_dc5
- Issue: #6004
[x] detectron2_fasterrcnn_r_50_fpn
- Issue: #6004
[ ] detectron2_fcos_r_50_fpn
- Skipped by the benchmarking script
[x] detectron2_maskrcnn_r_101_c4
- Issue: #6004
[x] detectron2_maskrcnn_r_101_fpn
- Issue: #6004
[x] detectron2_maskrcnn_r_50_c4
- Issue: #6004
[x] detectron2_maskrcnn_r_50_fpn
- Issue: #6004
[ ] dlrm
- Issue: #6008
[ ] hf_GPT2_large
- Issue: #6003
[ ] hf_Longformer
- Issue: #5835
[ ] hf_T5_base
- Issue: #6003
[ ] llama_v2_7b_16h
- Issue: #6003
[ ] moco
- Issue: #6083
- Issue: #6011
  - PyTorch/XLA PR: #6296
  - PyTorch/XLA PR: #6076
- Issue: #6010
  - PyTorch/XLA PR: #6060
  - PyTorch/XLA PR: #6071
[ ] nvidia_deeprecommender
- RTX 2060 OOM
- Issue: #6006
  - PyTorch/XLA PR: #6296
  - PyTorch/XLA PR: #6076
[x] pytorch_CycleGAN_and_pix2pix
- Issue: #6007
  - PyTorch/XLA PR: #6296
  - PyTorch/XLA PR: #6076
[ ] stable_diffusion_unet
- Issue: #6003
[ ] tacotron2
- Issue: #6112
- Issue: #6010
  - PyTorch/XLA PR: #6060
  - PyTorch/XLA PR: #6071
[x] timm_efficientdet
- Issue: #6011
  - PyTorch/XLA PR: #6296
  - PyTorch/XLA PR: #6076
- Issue: #6010
  - PyTorch/XLA PR: #6060
  - PyTorch/XLA PR: #6071
[ ] timm_nfnet
- Issue: #6003
[ ] timm_vision_transformer_large
- Issue: #6003
[x] yolov3
- Issue: #6010
  - PyTorch/XLA PR: #6060
  - PyTorch/XLA PR: #6071

Dynamo+`openxla`. Pass rate: 57/99 (57%)

[ ] densenet121
- Issue: #6003
[ ] dlrm
- Issue: #6008
[x] hf_BigBird
- Issue: #5966
  - PyTorch/XLA PR: #6170
[x] hf_GPT2
- PyTorch/XLA PR: #5922
[x] hf_GPT2_large
- PyTorch/XLA PR: #5922
[ ] hf_Longformer
- Issue: #5835
[ ] hf_Reformer
- Issue: #6009
[ ] moco
- Issue: #6083
- Issue: #6011
  - PyTorch/XLA PR: #6296
  - PyTorch/XLA PR: #6076
- Issue: #6010
  - PyTorch/XLA PR: #6060
  - PyTorch/XLA PR: #6071
[ ] nvidia_deeprecommender
- Issue: #6084
- Issue: #6006
  - PyTorch/XLA PR: #6296
  - PyTorch/XLA PR: #6076
[x] pytorch_CycleGAN_and_pix2pix
- Issue: #6007
  - PyTorch/XLA PR: #6296
  - PyTorch/XLA PR: #6076
[ ] stable_diffusion_unet
- Issue: #6003
[ ] timm_efficientdet
- Issue: #6003
- Issue: #6011
  - PyTorch/XLA PR: #6296
  - PyTorch/XLA PR: #6076
- Issue: #6010
  - PyTorch/XLA PR: #6060
  - PyTorch/XLA PR: #6071
[x] timm_vision_transformer
- Issue: #6003
[x] torch_multimodal_clip
- Issue: #6005
[x] yolov3
- Issue: #6010
  - PyTorch/XLA PR: #6060
  - PyTorch/XLA PR: #6071

Models also Failing on Inductor

No Training Support on Inductor CUDA

Benchmarks that raise the error: Model's DEFAULT_TRAIN_BSIZE is not implemented.

[ ] cm3leon_generate
[ ] detectron2_fcos_r_50_fpn
[ ] doctr_det_predictor
[ ] doctr_reco_predictor
[ ] hf_T5_generate
[ ] llama
[ ] phi_1_5
[ ] pyhpc_equation_of_state
[ ] pyhpc_isoneutral_mixing
[ ] pyhpc_turbulent_kinetic_energy
[ ] sam
[ ] simple_gpt
[ ] simple_gpt_tp_manual

Training Failing on Inductor CUDA with the Same Error

Benchmarks that raise the same error on inductor:

[ ] DALLE2_pytorch
- Issue: #6084
- Issue: #6010
  - PyTorch/XLA PR: #6060
  - PyTorch/XLA PR: #6071
[ ] demucs
- Issue: #6003
[ ] llama_v2_7b_16h
- Issue: #6003
[ ] maml
- Issue: #6084
[ ] timm_vision_transformer_large
- Issue: #6003
[ ] vision_maskrcnn
- targets should not be none when in training mode
- Fix https://github.com/pytorch/pytorch/pull/114774

Training Failing on Inductor CUDA with Different Errors

[ ] detectron2_fasterrcnn_r_101_c4
- Issue: #5966
  - PyTorch/XLA PR: #6170
[ ] detectron2_fasterrcnn_r_101_dc5
- Issue: #5966
  - PyTorch/XLA PR: #6170
[ ] detectron2_fasterrcnn_r_101_fpn
- Issue: #5966
  - PyTorch/XLA PR: #6170
[ ] detectron2_fasterrcnn_r_50_c4
- Issue: #5966
  - PyTorch/XLA PR: #6170
[ ] detectron2_fasterrcnn_r_50_dc5
- Issue: #5966
  - PyTorch/XLA PR: #6170
[ ] detectron2_fasterrcnn_r_50_fpn
- Issue: #5966
  - PyTorch/XLA PR: #6170
[ ] detectron2_maskrcnn
- Issue: #5966
  - PyTorch/XLA PR: #6170
[ ] detectron2_maskrcnn_r_101_c4
- Issue: #5966
  - PyTorch/XLA PR: #6170
[ ] detectron2_maskrcnn_r_101_fpn
- Issue: #5966
  - PyTorch/XLA PR: #6170
[ ] detectron2_maskrcnn_r_50_c4
- Issue: #5966
  - PyTorch/XLA PR: #6170
[ ] detectron2_maskrcnn_r_50_fpn
- Issue: #5966
  - PyTorch/XLA PR: #6170
[ ] opacus_cifar10
- Issue: #5967
[ ] tacotron2
- Issue: #6005
- Issue: #6010
  - PyTorch/XLA PR: #6060
  - PyTorch/XLA PR: #6071

cc @JackCaoG @miladm

Nov 28 '23 20:11 ysiraichi

State after 7 weeks of work:

Models fixed so far:

pyhpc_isoneutral_mixing
pyhpc_turbulent_kinetic_energy
dlrm
Super_SloMo
speech_transformer

PRs to fix the models. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

https://github.com/pytorch/xla/pull/5688
https://github.com/pytorch/xla/pull/5689
https://github.com/pytorch/xla/pull/5707
https://github.com/pytorch/xla/pull/5743
https://github.com/pytorch/xla/pull/5769
https://github.com/pytorch/xla/pull/5823
https://github.com/pytorch/xla/pull/5914
https://github.com/pytorch/pytorch/pull/112202
https://github.com/pytorch/pytorch/pull/114626
https://github.com/pytorch/pytorch/pull/114626
https://github.com/pytorch/benchmark/pull/2038

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

https://github.com/pytorch/pytorch/pull/114932
https://github.com/pytorch/xla/pull/5922
https://github.com/pytorch/xla/pull/5960
https://github.com/pytorch/xla/pull/5963
https://github.com/pytorch/xla/pull/5939
https://github.com/pytorch/benchmark/pull/2072

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

https://github.com/pytorch/xla/issues/5835
https://github.com/pytorch/xla/issues/5837
https://github.com/pytorch/xla/issues/5839
https://github.com/pytorch/xla/issues/5932
https://github.com/pytorch/xla/issues/5942
https://github.com/pytorch/pytorch/issues/111033
https://github.com/pytorch/pytorch/issues/114302

Dec 01 '23 11:12 lezcano

Weekly update (Dec 1~Dec 10):

Models fixed:

DALLE2_pytorch
- training is now failing with the same error as inductor
stable_diffusion_unet
- training is still failing with OOM
stable_diffusion_text_encoder
hf_GPT2
hf_GPT2_large
- training without dynamo is still failing
yolov3
- Failing possibly due to a cuNND error, which is likely an OOM, on a RTX 2060. Haven't tested it yet on a A100, though

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

https://github.com/pytorch/xla/pull/5922
https://github.com/pytorch/xla/pull/5939
https://github.com/pytorch/xla/pull/6060
https://github.com/pytorch/xla/pull/6068
https://github.com/pytorch/xla/pull/6069
https://github.com/pytorch/xla/pull/6071
https://github.com/pytorch/benchmark/pull/2072
https://github.com/pytorch/pytorch/pull/114932

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

https://github.com/pytorch/xla/pull/6076
https://github.com/pytorch/xla/pull/6067
https://github.com/pytorch/xla/pull/6070
https://github.com/pytorch/xla/pull/6072

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

https://github.com/pytorch/xla/issues/5966
https://github.com/pytorch/xla/issues/5967
https://github.com/pytorch/xla/issues/6003
https://github.com/pytorch/xla/issues/6004
https://github.com/pytorch/xla/issues/6005
https://github.com/pytorch/xla/issues/6008
https://github.com/pytorch/xla/issues/6009
https://github.com/pytorch/xla/issues/6083
https://github.com/pytorch/xla/issues/6085
https://github.com/pytorch/xla/issues/6086

Dec 11 '23 13:12 ysiraichi

Weekly update (Dec 11~Dec 15):

Models fixed:

pytorch_CycleGAN_and_pix2pix
nvidia_deeprecommender
- dynamo+openxla training is still failling
simple_gpt and simple_gpt_tp_manual
- failing due to the same reasons as inductor
moco
- failing due to distributed backend
timm_efficientdet
- dynamo+openxla training is still failing

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

https://github.com/pytorch/xla/pull/6072
https://github.com/pytorch/xla/pull/6076
https://github.com/pytorch/xla/pull/6130
https://github.com/pytorch/xla/pull/6153
https://github.com/pytorch/xla/pull/6182

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

https://github.com/pytorch/xla/pull/6070
https://github.com/pytorch/xla/pull/6160
https://github.com/pytorch/xla/pull/6170
https://github.com/pytorch/xla/pull/6178
https://github.com/pytorch/xla/pull/6180
https://github.com/pytorch/pytorch/pull/115924

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

https://github.com/pytorch/xla/issues/6084
https://github.com/pytorch/xla/issues/6112
https://github.com/pytorch/pytorch/issues/115900

Dec 15 '23 19:12 ysiraichi

Can we please add a pass rate table in the weekly report that includes:

Inference

Inductor, Dynamo+PyTocrh/XLA:GPU, Non-Dynamo+PyTocrh/XLA:GPU

Training

Inductor, Dynamo+PyTocrh/XLA:GPU, Non-Dynamo+PyTocrh/XLA:GPU

Jan 10 '24 18:01 miladm

Weekly update (Jan 8 ~ Jan 12):

Pass rate (out of 99 benchmarks):

	Inference	Training
Inductor	91	64
Non-Dynamo	87	67
Dynamo	86	57

Models fixed:

detectron2 models (inference with dynamo)
hf_BigBird (inference and training with dynamo)
torch_multimodal_clip (training with dynamo)
timm_vision_transformer (training with dynamo)
Likely not due to the merged PRs below:
- detectron2 models: all but detectron2_fcos_r_50_fpn (training without dynamo)

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

https://github.com/pytorch/pytorch/pull/115924

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

https://github.com/pytorch/xla/pull/6302
https://github.com/pytorch/xla/pull/6296
https://github.com/pytorch/xla/pull/6160
https://github.com/pytorch/xla/pull/6070

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

https://github.com/pytorch/xla/issues/6292

Jan 16 '24 16:01 ysiraichi

Weekly update (Jan 15 ~ Jan 19):

Pass rate (out of 99 benchmarks):

	Inference	Training
Inductor	85	62
Non-Dynamo	70	57
Dynamo	71	55

Models that started failing:

After #6296:
- detectron2_fasterrcnn_r_101_c4
- detectron2_fasterrcnn_r_101_dc5
- detectron2_fasterrcnn_r_101_fpn
- detectron2_fasterrcnn_r_50_c4
- detectron2_fasterrcnn_r_50_dc5
- detectron2_fasterrcnn_r_50_fpn
- detectron2_fcos_r_50_fpn
- detectron2_maskrcnn_r_101_c4
- detectron2_maskrcnn_r_101_fpn
- detectron2_maskrcnn_r_50_c4
- detectron2_maskrcnn_r_50_fpn
- mobilenet_v3_large
- timm_regnet
- hf_Bart
Started being skipped:
- pytorch_CycleGAN_and_pix2pix
- pytorch_unet
Unsupported precision:
- pytorch_unet
- yolov3
cuDNN error:
- Super_SloMo (inductor)

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

https://github.com/pytorch/xla/pull/6302
https://github.com/pytorch/xla/pull/6296
https://github.com/pytorch/xla/pull/6325

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

https://github.com/pytorch/xla/pull/6160
https://github.com/pytorch/xla/pull/6070

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

https://github.com/pytorch/xla/issues/6336

Jan 23 '24 15:01 ysiraichi

Can we track separate passrate tables for L4 and A100 GPUs going forward @ysiraichi?

cc @frgossen @golechwierowicz @cota

Jan 23 '24 16:01 miladm

Weekly update (Jan 22 ~ Jan 26):

Pass rate (out of 99 benchmarks):

	Inference	Training
Inductor	88	63
Non-Dynamo	69	57
Dynamo	72	55

Models fixed:

(inductor) moco
(inductor) Super_SloMo
- Failed when executed with all other benchmarks
- Passed when executed alone (by specifying --filter argument)
(inference) llama_v2_7b_16h

Models that started failing:

(inference + non-dynamo) timm_efficientnet (to be fixed by: #6389)
(inference + non-dynamo) timm_nfnet (to be fixed by: #6389)

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

https://github.com/pytorch/xla/pull/6350
https://github.com/pytorch/xla/pull/6374
https://github.com/pytorch/xla/pull/6375
https://github.com/pytorch/benchmark/pull/2124
https://github.com/pytorch/pytorch/pull/118032

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

https://github.com/pytorch/xla/pull/6389
https://github.com/pytorch/xla/pull/6160
https://github.com/pytorch/xla/pull/6070

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

https://github.com/pytorch/xla/issues/6348
https://github.com/pytorch/xla/issues/6353
https://github.com/pytorch/xla/issues/6366
https://github.com/pytorch/xla/issues/6367
https://github.com/pytorch/xla/issues/6380
https://github.com/pytorch/xla/issues/6391

Jan 29 '24 13:01 ysiraichi

Weekly update (Jan 29 ~ Feb 2):

Pass rate (out of 99 benchmarks):

A100

	Inference	Training
Inductor	87 (last: 88)	63
Non-Dynamo	82 (last: 69)	56 (last: 57)
Dynamo	82 (last: 72)	53 (last: 55)

L4

	Inference	Training
Inductor	86	60
Non-Dynamo	81	53
Dynamo	82	49

Models Summary (for A100)

Inductor: Inference (-4, +3)
- (fail) New skips by PyTorch's torchbench skip list:
  - detectron2_maskrcnn
  - hf_Bert
  - hf_Bert_large
  - maml
- (pass) Remove outdated skip:
  - vision_maskrcnn
- (pass) AMP supported:
  - pytorch_unet
  - yolov3
Inductor: Training (-3, +3)
- (fail) New skips by PyTorch's torchbench skip list:
  - hf_Bert
  - hf_Bert_large
- (fail) Failing due to sparse error:
  - dlrm
- (pass) AMP supported:
  - pytorch_unet
- (pass) No OOM:
  - demucs
  - opacus_cifar10
XLA:GPU (non-dynamo): Inference (-3, +16)
- (fail) New skips by PyTorch's torchbench skip list:
  - detectron2_maskrcnn
  - hf_Bert
  - hf_Bert_large
- (pass) Forcing fp32 precision (while setting XLA_USE_FP16):
  - detectron2 benchmarks (11)
  - mobilenet_v3_large
  - timm_efficientnet
  - timm_nfnet
  - timm_regnet
- (pass) AMP supported:
  - yolov3
XLA:GPU (non-dynamo): Training (-2, +1)
- (fail) New skips by PyTorch's torchbench skip list:
  - hf_Bert
  - hf_Bert_large
- (pass) No OOM:
  - hf_GPT2_large
XLA:GPU (dynamo): Inference (-4, +14)
- (fail) New skips by PyTorch's torchbench skip list:
  - detectron2_maskrcnn
  - hf_Bert
  - hf_Bert_large
  - maml
- (pass) Remove outdated skip:
  - vision_maskrcnn
- (pass) Forcing fp32 precision (while setting XLA_USE_FP16):
  - detectron2 benchmarks (11)
  - hf_Bart
- (pass) AMP supported:
  - yolov3
XLA:GPU (dynamo): Training (-2, +0)
- (fail) New skips by PyTorch's torchbench skip list:
  - hf_Bert
  - hf_Bert_large

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

https://github.com/pytorch/xla/pull/6070
https://github.com/pytorch/xla/pull/6160
https://github.com/pytorch/xla/pull/6389
https://github.com/pytorch/xla/pull/6402
https://github.com/pytorch/xla/pull/6407
https://github.com/pytorch/xla/pull/6416
https://github.com/pytorch/xla/pull/6419
https://github.com/pytorch/xla/pull/6421
https://github.com/pytorch/xla/pull/6446
https://github.com/pytorch/xla/pull/6447

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

https://github.com/pytorch/pytorch/pull/118783

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

https://github.com/pytorch/xla/issues/6403
https://github.com/pytorch/xla/issues/6404

Feb 05 '24 14:02 ysiraichi

Weekly update (Feb 5 ~ Feb 9):

Pass rate (out of 99 benchmarks):

A100

	Inference	Training
Inductor	87 (last: 87)	63
Non-Dynamo	82 (last: 82)	57 (last: 56)
Dynamo	84 (last: 82)	53 (last: 53)

L4

	Inference	Training
Inductor	86	60
Non-Dynamo	81	53
Dynamo	84	49

Models Summary

XLA:GPU (non-dynamo): Training (0, +1)
- (pass) No OOM:
  - densenet121
XLA:GPU (dynamo): Inference (0, +2)
- (pass) Increased compilation cache:
  - cm3leon_generate
  - hf_T5_generate

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

https://github.com/pytorch/xla/pull/6484
https://github.com/pytorch/xla/pull/6491
https://github.com/pytorch/xla/pull/6509
https://github.com/pytorch/xla/pull/6512
https://github.com/pytorch/pytorch/pull/118783

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

https://github.com/pytorch/xla/pull/6518

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

https://github.com/pytorch/xla/issues/6483
https://github.com/pytorch/xla/issues/6511
https://github.com/pytorch/pytorch/issues/119680

Feb 12 '24 13:02 ysiraichi

Weekly update (Feb 12 ~ Feb 16):

Pass rate (out of 99 benchmarks):

Could not run the benchmarks this time, due to a compilation issue: #6564

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

https://github.com/pytorch/xla/pull/6518
https://github.com/pytorch/xla/pull/6558
https://github.com/pytorch/xla/pull/6550

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

https://github.com/pytorch/xla/pull/6542
https://github.com/pytorch/pytorch/pull/120117

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

https://github.com/pytorch/xla/issues/6520
https://github.com/pytorch/xla/issues/6521
https://github.com/pytorch/xla/issues/6540
https://github.com/pytorch/xla/issues/6521
https://github.com/pytorch/xla/issues/6556
https://github.com/pytorch/xla/issues/6557
https://github.com/pytorch/xla/issues/6564
https://github.com/pytorch/pytorch/issues/120115

Feb 19 '24 14:02 ysiraichi

Weekly update (Feb 19 ~ Feb 23):

Pass rate (out of 99 benchmarks):

There was an error in the benchmarking scripts, making it so we were unable to run using XLA: https://github.com/pytorch/xla/pull/6612

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

https://github.com/pytorch/xla/pull/6597
https://github.com/pytorch/pytorch/pull/120117
https://github.com/pytorch/pytorch/pull/120299

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

https://github.com/pytorch/xla/pull/6542
https://github.com/pytorch/xla/pull/6612
https://github.com/pytorch/pytorch/pull/120435

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

https://github.com/pytorch/pytorch/issues/120336
https://github.com/pytorch/pytorch/issues/120585

Feb 26 '24 14:02 ysiraichi

Pass rate (out of 99 benchmarks):

A100

	Inference	Training
Inductor	81 (last: 87)	65 (last: 63)
Non-Dynamo	72 (last: 82)	61 (last: 57)
Dynamo	73 (last: 84)	54 (last: 53)

L4

	Inference	Training
Inductor	81 (last: 86)	62 (last: 60)
Non-Dynamo	71 (last: 81)	57 (last: 53)
Dynamo	73 (last: 84)	52 (last: 49)

Models Summary

Inductor: Inference (-10, +4)
- (fail) "roi_align_forward_kernel" not implemented for 'BFloat16' (after: #6518)
  - detectron2 benchmarks (10)
- (pass) Remove outdated skips
  - hf_Bert and hf_Bert_large
  - maml
  - pytorch_CycleGAN_and_pix2pix
Inductor: Training (-3, +5)
- (fail) Running on AMP (after: #6518)
  - mobilenet_v2_quantized_qat
  - resnet50_quantized_qat
- (pass) Remove outdated skips
  - hf_Bert and hf_Bert_large
  - pytorch_CycleGAN_and_pix2pix
XLA:GPU (non-dynamo): Inference (-15, +5)
- (fail) Error while lowering: aten::upsample_bilinear2d (after: #6518) (issue: #6520)
  - Background_Matting
- (fail) CPU fallback does not work with mixed dtypes (issue: #6336)
  - detectron2 benchmarks (11)
- (fail) Seen floating point types of different precisions in HLO (after: #6518) (issue: #6521)
  - hf_GPT2 and hf_GPT2_large
- (fail) Indices types are not Long (they are Int) (after: #6518) (issue: #6648)
  - llama
- (pass) Remove outdated skips
  - hf_Bert and hf_Bert_large
  - maml
  - pytorch_CycleGAN_and_pix2pix
  - pytorch_unet
XLA:GPU (non-dynamo): Training (0, +4)
- (pass) Remove outdated skips
  - hf_Bert and hf_Bert_large
  - pytorch_CycleGAN_and_pix2pix
  - pytorch_unet
XLA:GPU (dynamo): Inference (-16, +5)
- (fail) expected scalar type Float but found Half (after: #6518) (issue: #6556)
  - Super_SloMo
- (fail) CPU fallback does not work with mixed dtypes (issue: #6336)
  - detectron2 benchmarks (11)
- (fail) Seen floating point types of different precisions in HLO (after: #6518) (issue: #6521)
  - hf_GPT2 and hf_GPT2_large
- (fail) Indices types are not Long (they are Int) (after: #6518) (issue: #6648)
  - llama
- (fail) Slice size at index 0 in gather op is out of range, must be within [0, 1), got 1. (issue: #6557)
  - vision_maskrcnn
XLA:GPU (dynamo): Training (-4, +5)
- (fail) expected scalar type Float but found Half (after: #6518) (issue: #6556)
  - Super_SloMo
- (fail) Seen floating point types of different precisions in HLO (after: #6518)
  - hf_GPT2 and hf_GPT2_large (issue: #6521)
  - timm_nfnet (issue: #6649)
- (pass) Remove outdated skips
  - hf_Bert and hf_Bert_large
  - pytorch_CycleGAN_and_pix2pix
  - pytorch_unet
- (pass) No OOM
  - stable_diffusion_unet

Feb 27 '24 14:02 ysiraichi

Weekly update (Feb 26 ~ Mar 01):

Pass rate (out of 99 benchmarks):

PyTorch commit: d9db9e62e3d2d58d4e76a43f30c15db389e51c17
PyTorch/XLA commit: 5a113aff98ce42420891c724843ccb30691dc24a
PyTorch/benchmark commit: 62f4e9c6427b467ba77d06fc9952bf4a28204488

A100

	Inference	Training
Inductor	81 (last: 81)	65 (last: 65)
Non-Dynamo	72 (last: 72)	61 (last: 61)
Dynamo	73 (last: 73)	56 (last: 54)

L4

	Inference	Training
Inductor	81 (last: 81)	63 (last: 62)
Non-Dynamo	72 (last: 71)	58 (last: 57)
Dynamo	71 (last: 73)	54 (last: 52)

Models Summary

XLA:GPU (non-dynamo): Training (-1, +1)
- (fail) Timeout:
  - timm_efficientdet
- (pass) Smaller batch size
  - demucs
XLA:GPU (dynamo): Inference (-2, 0)
- (fail) Timeout:
  - cm3leon_generate
  - hf_T5_generate
XLA:GPU (dynamo): Training (0, +2)
- (pass) Smaller batch size
  - densenet121
  - timm_efficientdet

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

https://github.com/pytorch/xla/pull/6542
https://github.com/pytorch/xla/pull/6612
https://github.com/pytorch/xla/pull/6632

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

https://github.com/pytorch/xla/pull/6659
https://github.com/pytorch/xla/pull/6624
https://github.com/pytorch/xla/pull/6661
https://github.com/pytorch/pytorch/pull/120435
https://github.com/pytorch/pytorch/pull/121007
https://github.com/pytorch/pytorch/pull/121074
https://github.com/pytorch/pytorch/pull/121075

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

https://github.com/pytorch/xla/pull/6648
https://github.com/pytorch/xla/pull/6649

Mar 04 '24 14:03 ysiraichi

Weekly update (Mar 04 ~ Mar 08):

Pass rate (out of 99 benchmarks):

PyTorch commit: c253d1c1db06beb128f6bb4db861cd08a3c23c6b
PyTorch/XLA commit: 57f4780d2d5efd04e85e4a2c288eefdb596d2200
PyTorch/benchmark commit: 62f4e9c6427b467ba77d06fc9952bf4a28204488

A100

	Inference	Training
Inductor	81 (last: 81)	66 (last: 65)
Non-Dynamo	72 (last: 72)	61 (last: 61)
Dynamo	71 (last: 71)	57 (last: 56)

L4

	Inference	Training
Inductor	81 (last: 81)	64 (last: 63)
Non-Dynamo	72 (last: 72)	58 (last: 58)
Dynamo	71 (last: 71)	55 (last: 54)

Models Summary (A100)

Inductor: Training (0, +1)
- (pass) Reason unknown
  - dlrm
XLA:GPU (dynamo): Training (0, +1)
- (pass) Tensor.new dynamo support
  - hf_Reformer

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

https://github.com/pytorch/xla/pull/6624
https://github.com/pytorch/pytorch/pull/121075

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

https://github.com/pytorch/xla/pull/6659
https://github.com/pytorch/xla/pull/6661
https://github.com/pytorch/xla/pull/6697
https://github.com/pytorch/pytorch/pull/121007

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

Mar 11 '24 14:03 ysiraichi

Weekly update (Mar 11 ~ Mar 15):

Pass rate (out of 99 benchmarks):

PyTorch commit: 5f601a41e0a8c91ecf7ca5e4b95d752166ed9093
PyTorch/XLA commit: dbe2bc2aa9c680e42c49cb9a0c3a2c0a562082f8
PyTorch/benchmark commit: 62f4e9c6427b467ba77d06fc9952bf4a28204488

A100

	Inference	Training
Inductor	81 (last: 81)	66 (last: 66)
Non-Dynamo	37 (last: 72)	28 (last: 61)
Dynamo	31 (last: 71)	18 (last: 57)

L4

	Inference	Training
Inductor	81 (last: 81)	64 (last: 63)
Non-Dynamo	45 (last: 72)	38 (last: 58)
Dynamo	44 (last: 71)	22 (last: 55)

Models Summary (A100)

No summary this week because:

Diff is too big
It might be due to a pin update

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

https://github.com/pytorch/xla/pull/6718
https://github.com/pytorch/xla/pull/6745
https://github.com/pytorch/xla/pull/6697

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

https://github.com/pytorch/xla/pull/6659
https://github.com/pytorch/xla/pull/6661
https://github.com/pytorch/pytorch/pull/121007

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

https://github.com/pytorch/xla/pull/6750
https://github.com/pytorch/pytorch/pull/121926

Mar 19 '24 19:03 ysiraichi

@ysiraichi The regression you saw might be due to https://github.com/pytorch/xla/pull/6677 (open xla pin update). Our team is looking into this issue.

Mar 21 '24 16:03 vanbasten23

Weekly update (Mar 18 ~ Mar 21):

Pass rate (out of 99 benchmarks):

PyTorch commit: 5f601a41e0a8c91ecf7ca5e4b95d752166ed9093
PyTorch/XLA commit: dbe2bc2aa9c680e42c49cb9a0c3a2c0a562082f8
PyTorch/benchmark commit: 62f4e9c6427b467ba77d06fc9952bf4a28204488

A100

	Inference	Training
Inductor	81 (last: 81)	66 (last: 66)
Non-Dynamo	76 (last: 72)	64 (last: 61)
Dynamo	73 (last: 71)	58 (last: 57)

L4

	Inference	Training
Inductor	80 (last: 81)	64 (last: 64)
Non-Dynamo	76 (last: 72)	61 (last: 58)
Dynamo	74 (last: 71)	56 (last: 55)

Models Summary (A100)

XLA:GPU (non-dynamo): Inference (0, +4)
- (pass) as_strided_copy new implementation
  - hf_Longformer
- (pass) pow data-type promotion fixed
  - hf_GPT2
  - hf_GPT2_large
- (pass) Loosen Embedding index type requirement
  - llama
XLA:GPU (non-dynamo): Training (0, +3)
- (pass) as_strided_copy new implementation
  - hf_Longformer
- (pass) Unknown reason:
  - hf_T5_base
  - timm_efficientdet
XLA:GPU (dynamo): Inference (-2, +4)
- (pass) as_strided_copy new implementation
  - hf_Longformer
- (pass) pow data-type promotion fixed
  - hf_GPT2
  - hf_GPT2_large
- (pass) Loosen Embedding index type requirement
  - llama
- (fail) Unknown reason:
  - doctr_reco_predictor https://github.com/pytorch/xla/issues/6832
  - speech_transformer https://github.com/pytorch/xla/issues/6831
XLA:GPU (dynamo): Training (-2, +3)
- (pass) as_strided_copy new implementation
  - hf_Longformer
- (pass) pow data-type promotion fixed
  - hf_GPT2
  - hf_GPT2_large
- (fail) Unknown reason:
  - hf_Reformer https://github.com/pytorch/xla/issues/6830
  - speech_transformer https://github.com/pytorch/xla/issues/6831

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

https://github.com/pytorch/xla/pull/6659
https://github.com/pytorch/xla/pull/6661
https://github.com/pytorch/xla/pull/6814
https://github.com/pytorch/pytorch/pull/121007

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

Mar 25 '24 15:03 ysiraichi

Last week, the results were unchanged. We are preparing for performance optimizations. cc @ysiraichi

Apr 01 '24 16:04 miladm

Weekly update (Apr 1 ~ Apr 5):

Pass rate (out of 99 benchmarks):

PyTorch commit: 72662bf05b3499ce96aae9183a489c78f0c44c84
PyTorch/XLA commit: 5c48be19e6ded305bb524b3d1231fd4ce4d46208
PyTorch/benchmark commit: d6015d42d9a1834bc7595c4bd6852562fb80b30b

A100

	Inference	Training
Inductor	81 (last: 81)	66 (last: 66)
Non-Dynamo	75 (last: 76)	63 (last: 64)
Dynamo	73 (last: 73)	53 (last: 58)

L4

	Inference	Training
Inductor	82 (last: 80)	65 (last: 64)
Non-Dynamo	75 (last: 76)	61 (last: 61)
Dynamo	74 (last: 74)	51 (last: 56)

Models Summary (A100)

Inductor: Inference (-1, +1)
- (pass) dlrm
- (fail) maml
XLA:GPU (non-dynamo): Inference (-1, 0)
- (fail) timm_efficientdet https://github.com/pytorch/xla/issues/6889
XLA:GPU (non-dynamo): Training (-1, 0)
- (fail) timm_efficientdet: OOM
XLA:GPU (dynamo): Inference (-1, +1)
- (pass) speech_transformer
- (fail) timm_efficientdet https://github.com/pytorch/xla/issues/6899
XLA:GPU (dynamo): Training (-7, +2)
- (pass) hf_Reformer and speech_transformer
- (fail) hf_GPT2 and hf_GPT2_large https://github.com/pytorch/xla/issues/6900
- (fail) hf_T5, hf_T5_base, stable_diffusion_unet, and timm_vision_transformer_large: OOM
- (fail) hf_T5_large https://github.com/pytorch/xla/issues/6901

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

https://github.com/pytorch/xla/pull/6814
https://github.com/pytorch/xla/pull/6881

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

https://github.com/pytorch/xla/pull/6659
https://github.com/pytorch/xla/pull/6661
https://github.com/pytorch/pytorch/pull/121007

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

https://github.com/pytorch/xla/pull/6833
https://github.com/pytorch/xla/pull/6899
https://github.com/pytorch/xla/pull/6900
https://github.com/pytorch/xla/pull/6901

Apr 08 '24 15:04 ysiraichi

Weekly update (Apr 8 ~ Apr 12):

Pass rate (out of 99 benchmarks):

PyTorch commit: f5331aade57725b03c36d5cc6c683f6a6bc0692d
PyTorch/XLA commit: 58a412cb271a3f98ae2e01fd1d24bdbb66645d4e
PyTorch/benchmark commit: d6015d42d9a1834bc7595c4bd6852562fb80b30b

A100

	Inference	Training
Inductor	81 (last: 81)	66 (last: 66)
Non-Dynamo	74 (last: 75)	64 (last: 63)
Dynamo	74 (last: 73)	53 (last: 53)

L4

	Inference	Training
Inductor	82 (last: 82)	65 (last: 65)
Non-Dynamo	75 (last: 75)	61 (last: 61)
Dynamo	75 (last: 74)	51 (last: 51)

Models Summary (A100)

XLA:GPU (non-dynamo): Inference (-1, 0)
- (fail) doctr_reco_predictor: TIMEOUT
XLA:GPU (non-dynamo): Training (0, +1)
- (pass) timm_efficientdet
XLA:GPU (dynamo): Inference (0, +1)
- (pass) hf_Reformer

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

https://github.com/pytorch/xla/pull/6659
https://github.com/pytorch/xla/pull/6661
https://github.com/pytorch/pytorch/pull/121007

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

Apr 15 '24 14:04 ysiraichi

Weekly update (Apr 15 ~ Apr 19):

Pass rate (out of 99 benchmarks):

PyTorch commit: f5331aade57725b03c36d5cc6c683f6a6bc0692d
PyTorch/XLA commit: b06c9c7700e13b7731a2b2f3b9ddbbfef2d0793c
PyTorch/benchmark commit: d6015d42d9a1834bc7595c4bd6852562fb80b30b

A100

	Inference	Training
Inductor	? (last: 81)	? (last: 66)
Non-Dynamo	? (last: 74)	? (last: 64)
Dynamo	? (last: 74)	? (last: 53)

L4

	Inference	Training
Inductor	82 (last: 82)	65 (last: 65)
Non-Dynamo	76 (last: 75)	61 (last: 61)
Dynamo	76 (last: 75)	51 (last: 51)

Models Summary (A100)

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

https://github.com/pytorch/xla/pull/6933

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

Apr 22 '24 15:04 ysiraichi

Weekly update (Apr 22 ~ Apr 26):

Pass rate (out of 99 benchmarks):

PyTorch commit: f5331aade57725b03c36d5cc6c683f6a6bc0692d
PyTorch/XLA commit: 2a204e9b473831776def499c8106bafe2c418d24
PyTorch/benchmark commit: d6015d42d9a1834bc7595c4bd6852562fb80b30b

A100

	Inference	Training
Inductor	81 (last: 81)	66 (last: 66)
Non-Dynamo	75 (last: 74)	64 (last: 64)
Dynamo	75 (last: 74)	53 (last: 53)

L4

	Inference	Training
Inductor	81 (last: 82)	65 (last: 65)
Non-Dynamo	76 (last: 76)	61 (last: 61)
Dynamo	76 (last: 76)	51 (last: 51)

Models Summary (A100)

XLA:GPU (non-dynamo): Inference (0, +1)
- (pass) timm_efficientdet
XLA:GPU (dynamo): Inference (0, +1)
- (pass) timm_efficientdet

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

https://github.com/pytorch/xla/pull/6958

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

https://github.com/pytorch/xla/issues/6988

Apr 28 '24 14:04 ysiraichi

Weekly update (Apr 29 ~ May 3):

Pass rate (out of 99 benchmarks):

PyTorch commit: 489b4586e95752dc65a1821a4383b9679ccd5b6b
PyTorch/XLA commit: d1235858628417ed7abc0d61e6e9be50df3e1a87
PyTorch/benchmark commit: d6015d42d9a1834bc7595c4bd6852562fb80b30b

A100

	Inference	Training
Inductor	81 (last: 81)	66 (last: 66)
Non-Dynamo	76 (last: 75)	64 (last: 64)
Dynamo	75 (last: 75)	53 (last: 53)

L4

	Inference	Training
Inductor	82 (last: 81)	65 (last: 65)
Non-Dynamo	76 (last: 76)	61 (last: 61)
Dynamo	76 (last: 76)	51 (last: 51)

Models Summary (A100)

XLA:GPU (non-dynamo): Inference (0, +1)
- (pass) doctr_reco_predictor

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

https://github.com/pytorch/xla/pull/6958

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

May 04 '24 18:05 ysiraichi

Weekly update (May 6 ~ May 10):

Pass rate (out of 99 benchmarks):

CUDA version: 12.1 (before: 11.8)
Python version: 3.10 (before: 3.8)
- Reason: networkx had removed support to Python 3.9 (see issue update)
PyTorch commit: 946b96fd54fdaa05d2f5b1e49d837124fbace983
PyTorch/XLA commit: 40f7e1f54b506475d40b40c0f49193411de6d68f
PyTorch/benchmark commit: d6015d42d9a1834bc7595c4bd6852562fb80b30b

A100

	Inference	Training
Inductor	82 (last: 81)	66 (last: 66)
Non-Dynamo	76 (last: 75)	64 (last: 64)
Dynamo	75 (last: 75)	53 (last: 53)

L4

	Inference	Training
Inductor	82 (last: 82)	65 (last: 65)
Non-Dynamo	76 (last: 76)	61 (last: 61)
Dynamo	76 (last: 76)	51 (last: 51)

Notes

Inductor on L4 started failing with: SyntaxError: unterminated string literal
- Oddly enough, A100 didn't have the same error
- Didn't update the results of L4

Models Summary (A100)

Inductor: Inference (0, +1)
- (pass) maml

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

https://github.com/pytorch/pytorch/pull/125876

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

May 13 '24 16:05 ysiraichi

Weekly update (May 13 ~ May 17):

Pass rate (out of 99 benchmarks):

CUDA version: 12.1
Python version: 3.10
PyTorch commit: 8619fe6214cd8f31345ae73c5b90024a0233dc40
PyTorch/XLA commit: 62c3ba652ea09e2076a27f200ad755541f37daeb
PyTorch/benchmark commit: d6015d42d9a1834bc7595c4bd6852562fb80b30b

A100

	Inference	Training
Inductor	82 (last: 82)	66 (last: 66)
Non-Dynamo	77 (last: 76)	61 (last: 64)
Dynamo	78 (last: 75)	55 (last: 53)

L4

	Inference	Training
Inductor	82 (last: 82)	65 (last: 65)
Non-Dynamo	77 (last: 76)	59 (last: 61)
Dynamo	78 (last: 76)	52 (last: 51)

Models Summary (A100)

All the difference shown bellow is likely the result of #7067, which fixes AMP. Reason: (i) training benchmarks use AMP, by default; and (ii) there are some inference benchmarks that use AMP instead of bfloat16.

XLA:GPU (non-dynamo): Inference (0, +1)
- (pass) detectron2_fcos_r_50_fpn
XLA:GPU (non-dynamo): Training (-5, +2)
- (fail) Super_SloMo
- (fail) mobilenet_v2_quantized_qat
- (fail) resnet50_quantized_qat
- (fail) timm_efficientdet
- (fail) timm_nfnet
- (pass) stable_diffusion_unet
- (pass) timm_vision_transformer_large
XLA:GPU (dynamo): Inference (0, +3)
- (pass) Super_SloMo
- (pass) detectron2_fcos_r_50_fpn
- (pass) doctr_reco_predictor
XLA:GPU (dynamo): Training (0, +2)
- (pass) Super_SloMo
- (pass) timm_nfnet

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

https://github.com/pytorch/xla/pull/7067

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

https://github.com/pytorch/xla/pull/7080
https://github.com/pytorch/xla/pull/7081

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

May 20 '24 15:05 ysiraichi

Weekly update (May 20 ~ May 24):

Pass rate (out of 99 benchmarks):

CUDA version: 12.1
Python version: 3.10
PyTorch commit: 8619fe6214cd8f31345ae73c5b90024a0233dc40
PyTorch/XLA commit: cb8533be03c228a84db26ab7d44fdf0a2311462f
PyTorch/benchmark commit: d6015d42d9a1834bc7595c4bd6852562fb80b30b

A100

	Inference	Training
Inductor	82 (last: 82)	66 (last: 66)
Non-Dynamo	77 (last: 77)	63 (last: 61)
Dynamo	78 (last: 78)	55 (last: 55)

L4

	Inference	Training
Inductor	82 (last: 82)	65 (last: 65)
Non-Dynamo	77 (last: 77)	61 (last: 59)
Dynamo	78 (last: 78)	52 (last: 52)

Models Summary (A100)

XLA:GPU (non-dynamo): Training (-5, +2)
- (pass) Super_SloMo #7067
- (pass) timm_efficientdet #7091

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

https://github.com/pytorch/xla/pull/7080
https://github.com/pytorch/xla/pull/7081
https://github.com/pytorch/xla/pull/7090
https://github.com/pytorch/xla/pull/7091

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

https://github.com/pytorch/xla/pull/7111
https://github.com/pytorch/xla/pull/7113
https://github.com/pytorch/xla/pull/7116

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

https://github.com/pytorch/xla/issues/7095

May 25 '24 00:05 ysiraichi

Weekly update (May 27 ~ May 29):

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

https://github.com/pytorch/xla/pull/7111
https://github.com/pytorch/xla/pull/7113
https://github.com/pytorch/xla/pull/7116
https://github.com/pytorch/xla/pull/7130

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

https://github.com/pytorch/xla/pull/7168

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

Jun 03 '24 14:06 ysiraichi

Weekly update (June 3 ~ June 6):

Pass rate (out of 99 benchmarks):

CUDA version: 12.1
Python version: 3.10
PyTorch commit: f5328542b5365741176e71dd8a2954e0f350b9bc
PyTorch/XLA commit: aec273056a95d8119279c15d36c0f48f739fb810
PyTorch/benchmark commit: d6015d42d9a1834bc7595c4bd6852562fb80b30b

A100

	Inference	Training
Inductor	82 (last: 82)	65 (last: 66)
Non-Dynamo	79 (last: 77)	61 (last: 63)
Dynamo	79 (last: 78)	55 (last: 55)

L4

	Inference	Training
Inductor	82 (last: 82)	64 (last: 65)
Non-Dynamo	79 (last: 77)	60 (last: 61)
Dynamo	79 (last: 78)	52 (last: 52)

Models Summary (A100)

Inductor: Training (-1, +0)
- (fail) dlrm
XLA:GPU (non-dynamo): Inference (-0, +2)
- (pass) Background_Matting #7168
- (pass) vision_maskrcnn #7113 #7168
XLA:GPU (non-dynamo): Training (-3, +1)
- (pass) timm_nfnet #7130
- (fail) drq #7247
- (fail) stable_diffusion_unet: OOM
- (fail) timm_vision_transformer_large: OOM
XLA:GPU (dynamo): Inference (-0, +1)
- (pass) vision_maskrcnn #7113 #7116

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

https://github.com/pytorch/xla/pull/7168

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

https://github.com/pytorch/xla/issues/7198
https://github.com/pytorch/pytorch/issues/128165

Jun 10 '24 15:06 ysiraichi

Weekly update (June 10 ~ June 14):

Pass rate (out of 99 benchmarks):

CUDA version: 12.1
Python version: 3.10
PyTorch commit: 0344f95c2ea944cc916290097133470f963a5532
PyTorch/XLA commit: 286b31f0c0c752306e4a80a566b1ec9e82653991
PyTorch/benchmark commit: d6015d42d9a1834bc7595c4bd6852562fb80b30b

A100

	Inference	Training
Inductor	82 (last: 82)	65 (last: 65)
Non-Dynamo	79 (last: 79)	63 (last: 61)
Dynamo	79 (last: 79)	55 (last: 55)

L4

	Inference	Training
Inductor	82 (last: 82)	64 (last: 64)
Non-Dynamo	79 (last: 79)	61 (last: 60)
Dynamo	79 (last: 79)	52 (last: 52)

Models Summary (A100)

XLA:GPU (non-dynamo): Training (-1, +3)
- (pass) drq
- (pass) stable_diffusion_unet
- (pass) timm_vision_transformer_large
- (fail) timm_nfnet #7271

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

https://github.com/pytorch/xla/pull/7257
https://github.com/pytorch/benchmark/pull/2292

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

Jun 17 '24 14:06 ysiraichi

xla xla copied to clipboard

Failing Torchbench Models: tracking issue

Summary of Contributions (9th Feb)

Current State

Environment

Inference

Non-Dynamo. Pass rate: 87/99 (87%)

Dynamo+openxla. Pass rate: 86/99 (86%)

Models also Failing on Inductor

Inference Failing on Inductor CUDA with the Same Error

Inference Failing on Inductor CUDA with Different Errors

Training

Non-Dynamo. Pass rate: 67/99 (67%)

Dynamo+openxla. Pass rate: 57/99 (57%)

Models also Failing on Inductor

No Training Support on Inductor CUDA

Training Failing on Inductor CUDA with the Same Error

Training Failing on Inductor CUDA with Different Errors

Models fixed so far:

PRs to fix the models. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

Models fixed:

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

Models fixed:

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

Pass rate (out of 99 benchmarks):

Models fixed:

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

Pass rate (out of 99 benchmarks):

Models that started failing:

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

Pass rate (out of 99 benchmarks):

Models fixed:

Models that started failing:

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

Pass rate (out of 99 benchmarks):

A100

L4

Models Summary (for A100)

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

Pass rate (out of 99 benchmarks):

A100

L4

Models Summary

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

Pass rate (out of 99 benchmarks):

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

Pass rate (out of 99 benchmarks):

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

Pass rate (out of 99 benchmarks):

A100

L4

Models Summary

Pass rate (out of 99 benchmarks):

A100

L4

Models Summary

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

Pass rate (out of 99 benchmarks):

xla
xla copied to clipboard

Dynamo+`openxla`. Pass rate: 86/99 (86%)

Dynamo+`openxla`. Pass rate: 57/99 (57%)