xla icon indicating copy to clipboard operation
xla copied to clipboard

Failing Torchbench Models: tracking issue

Open ysiraichi opened this issue 1 year ago • 32 comments

Summary of Contributions (9th Feb)

  1. Improve the number of models in TorchBench that work with Dynamo as a tracer: These passing rates are now comparable to those from torch.compile using Inductor. Some of the fixes also improved the previous tracer that PyTorch/XLA used to use.

    Inference Training
    Inductor 87 63
    Dynamo 60 to 82 41 to 53
    Non-Dynamo 79 to 82 54 to 56
  2. Improve the benchmarking tools used by Google: The initial Google runs benchmarking these models showed a discrepancy of about 15 models with the results reported. We identified and fixed 10+ issues that helped reconcile Google's benchmarks with those reported and, in turn, with the PyTorch HUD.

Current State

This post has two lists:

  • Failing inference models
  • Failing training models

Each of them shows the failing models:

  • Tracing without Dynamo (Eager-mode)
  • Tracing with Dynamo into openxla (Dynamo+openxla)

These lists were created using the benchmarking scripts that currently live in the upstream. The following command was executed:

python xla/benchmarks/experiment_runner.py \
       --suite-name torchbench \
       --accelerator cuda \
       --xla PJRT --xla None \
       --dynamo openxla --dynamo inductor --dynamo None \
       --test eval --test train \
       --repeat 30 --iterations-per-run 5 \
       --print-subprocess \
       --no-resume

Environment

  • GPU: A100 40GB

Inference

Non-Dynamo. Pass rate: 87/99 (87%)

  • [x] DALLE2_pytorch
    • Issue: #6010
      • PyTorch/XLA PR: #6060
      • PyTorch/XLA PR: #6071
  • [ ] cm3leon_generate
    • Issue: #6004
  • [ ] hf_Longformer
    • Issue: #5835
  • [ ] hf_T5_generate
    • Issue: #6004
  • [ ] moco
    • Issue: #6083
    • Issue: #6011
      • PyTorch/XLA PR: #6296
      • PyTorch/XLA PR: #6076
    • Issue: #6010
      • PyTorch/XLA PR: #6060
      • PyTorch/XLA PR: #6071
  • [x] nvidia_deeprecommender
    • Issue: #6006
      • PyTorch/XLA PR: #6296
      • PyTorch/XLA PR: #6076
  • [x] pytorch_CycleGAN_and_pix2pix
    • Issue: #6007
      • PyTorch/XLA PR: #6296
      • PyTorch/XLA PR: #6076
  • [ ] simple_gpt
    • RTX 2060 doesn't support BF16
    • Issue: #6011
      • PyTorch/XLA PR: #6296
      • PyTorch/XLA PR: #6076
    • Issue: #6010
      • PyTorch/XLA PR: #6060
      • PyTorch/XLA PR: #6071
  • [ ] simple_gpt_tp_manual
    • RTX 2060 doesn't support BF16
    • Issue: #6011
      • PyTorch/XLA PR: #6296
      • PyTorch/XLA PR: #6076
    • Issue: #6010
      • PyTorch/XLA PR: #6060
      • PyTorch/XLA PR: #6071
  • [ ] tacotron2
    • Issue: #6112
    • Issue: #6010
      • PyTorch/XLA PR: #6060
      • PyTorch/XLA PR: #6071
  • [x] timm_efficientdet
    • Issue: #6011
      • PyTorch/XLA PR: #6296
      • PyTorch/XLA PR: #6076
    • Issue: #6010
      • PyTorch/XLA PR: #6060
      • PyTorch/XLA PR: #6071
  • [ ] vision_maskrcnn
    • PyTorch/XLA PR: #5743
    • PyTorch PR: https://github.com/pytorch/pytorch/pull/112202
    • SKIP because of incompatible model and experiment configs

Dynamo+openxla. Pass rate: 86/99 (86%)

  • [x] DALLE2_pytorch
    • Issue: #6010
      • PyTorch/XLA PR: #6060
      • PyTorch/XLA PR: #6071
  • [x] Super_SloMo
    • PyTorch/XLA PR: #5707
    • PyTorch/benchmark PR: https://github.com/pytorch/benchmark/pull/2038
  • [ ] cm3leon_generate
    • Issue: #5967
  • [x] detectron2_fasterrcnn_r_101_c4
    • Issue: #5966
      • PyTorch/XLA PR: #6170
  • [x] detectron2_fasterrcnn_r_101_dc5
    • Issue: #5966
      • PyTorch/XLA PR: #6170
  • [x] detectron2_fasterrcnn_r_101_fpn
    • Issue: #5966
      • PyTorch/XLA PR: #6170
  • [x] detectron2_fasterrcnn_r_50_c4
    • Issue: #5966
      • PyTorch/XLA PR: #6170
  • [x] detectron2_fasterrcnn_r_50_dc5
    • Issue: #5966
      • PyTorch/XLA PR: #6170
  • [x] detectron2_fasterrcnn_r_50_fpn
    • Issue: #5966
      • PyTorch/XLA PR: #6170
  • [x] detectron2_fcos_r_50_fpn
    • Issue: #5966
      • PyTorch/XLA PR: #6170
  • [x] detectron2_maskrcnn
    • Issue: #5966
      • PyTorch/XLA PR: #6170
  • [x] detectron2_maskrcnn_r_101_c4
    • Issue: #5966
      • PyTorch/XLA PR: #6170
  • [x] detectron2_maskrcnn_r_101_fpn
    • Issue: #5966
      • PyTorch/XLA PR: #6170
  • [x] detectron2_maskrcnn_r_50_c4
    • Issue: #5966
      • PyTorch/XLA PR: #6170
  • [x] detectron2_maskrcnn_r_50_fpn
    • Issue: #5966
      • PyTorch/XLA PR: #6170
  • [x] dlrm
    • PyTorch/XLA PR: #5743
    • PyTorch PR: https://github.com/pytorch/pytorch/pull/112202
  • [x] hf_BigBird
    • Issue: #5966
      • PyTorch/XLA PR: #6170
  • [x] hf_GPT2
    • PyTorch/XLA PR: #5922
  • [x] hf_GPT2_large
    • PyTorch/XLA PR: #5922
  • [ ] hf_Longformer
    • Issue: #5835
  • [ ] hf_Reformer
    • Issue: #5837
  • [ ] hf_T5_generate
    • Issue: #5967
  • [ ] moco
    • Issue: #6083
    • Issue: #6011
      • PyTorch/XLA PR: #6296
      • PyTorch/XLA PR: #6076
    • Issue: #6010
      • PyTorch/XLA PR: #6060
      • PyTorch/XLA PR: #6071
  • [x] nvidia_deeprecommender
    • Issue: #6006
      • PyTorch/XLA PR: #6296
      • PyTorch/XLA PR: #6076
  • [x] pyhpc_isoneutral_mixing
    • PyTorch/XLA PR: #5743
    • PyTorch PR: https://github.com/pytorch/pytorch/pull/112202
  • [x] pyhpc_turbulent_kinetic_energy
    • PyTorch/XLA PR: #5743
    • PyTorch PR: https://github.com/pytorch/pytorch/pull/112202
  • [x] pytorch_CycleGAN_and_pix2pix
    • Issue: #6007
      • PyTorch/XLA PR: #6296
      • PyTorch/XLA PR: #6076
  • [x] speech_transformer
    • PyTorch/XLA PR: #5823
  • [x] timm_efficientdet
    • Issue: #6011
      • PyTorch/XLA PR: #6296
      • PyTorch/XLA PR: #6076
    • Issue: #6010
      • PyTorch/XLA PR: #6060
      • PyTorch/XLA PR: #6071

Models also Failing on Inductor

Inference Failing on Inductor CUDA with the Same Error

Benchmarks that raise the same error on inductor:

  • [ ] hf_clip
    • 'str' object has no attribute 'shape'
  • [ ] mobilenet_v2_quantized_qat
  • [ ] resnet50_quantized_qat

Inference Failing on Inductor CUDA with Different Errors

  • [ ] doctr_det_predictor
    • Issue: #6005
  • [ ] simple_gpt
    • RTX 2060 doesn't support BF16
    • Issue: #6011
      • PyTorch/XLA PR: #6296
      • PyTorch/XLA PR: #6076
    • Issue: #6010
      • PyTorch/XLA PR: #6060
      • PyTorch/XLA PR: #6071
  • [ ] simple_gpt_tp_manual
    • RTX 2060 doesn't support BF16
    • Issue: #6011
      • PyTorch/XLA PR: #6296
      • PyTorch/XLA PR: #6076
    • Issue: #6010
      • PyTorch/XLA PR: #6060
      • PyTorch/XLA PR: #6071
  • [ ] tacotron2
    • Issue: #6005
    • Issue: #6010
      • PyTorch/XLA PR: #6060
      • PyTorch/XLA PR: #6071

Training

Non-Dynamo. Pass rate: 67/99 (67%)

  • [ ] DALLE2_pytorch
    • Issue: #6084
    • Issue: #6010
      • PyTorch/XLA PR: #6060
      • PyTorch/XLA PR: #6071
  • [ ] demucs
    • Issue: #6003
  • [ ] densenet121
    • Issue: #6003
  • [x] detectron2_fasterrcnn_r_101_c4
    • Issue: #6004
  • [x] detectron2_fasterrcnn_r_101_dc5
    • Issue: #6004
  • [x] detectron2_fasterrcnn_r_101_fpn
    • Issue: #6004
  • [x] detectron2_fasterrcnn_r_50_c4
    • Issue: #6004
  • [x] detectron2_fasterrcnn_r_50_dc5
    • Issue: #6004
  • [x] detectron2_fasterrcnn_r_50_fpn
    • Issue: #6004
  • [ ] detectron2_fcos_r_50_fpn
    • Skipped by the benchmarking script
  • [x] detectron2_maskrcnn_r_101_c4
    • Issue: #6004
  • [x] detectron2_maskrcnn_r_101_fpn
    • Issue: #6004
  • [x] detectron2_maskrcnn_r_50_c4
    • Issue: #6004
  • [x] detectron2_maskrcnn_r_50_fpn
    • Issue: #6004
  • [ ] dlrm
    • Issue: #6008
  • [ ] hf_GPT2_large
    • Issue: #6003
  • [ ] hf_Longformer
    • Issue: #5835
  • [ ] hf_T5_base
    • Issue: #6003
  • [ ] llama_v2_7b_16h
    • Issue: #6003
  • [ ] moco
    • Issue: #6083
    • Issue: #6011
      • PyTorch/XLA PR: #6296
      • PyTorch/XLA PR: #6076
    • Issue: #6010
      • PyTorch/XLA PR: #6060
      • PyTorch/XLA PR: #6071
  • [ ] nvidia_deeprecommender
    • RTX 2060 OOM
    • Issue: #6006
      • PyTorch/XLA PR: #6296
      • PyTorch/XLA PR: #6076
  • [x] pytorch_CycleGAN_and_pix2pix
    • Issue: #6007
      • PyTorch/XLA PR: #6296
      • PyTorch/XLA PR: #6076
  • [ ] stable_diffusion_unet
    • Issue: #6003
  • [ ] tacotron2
    • Issue: #6112
    • Issue: #6010
      • PyTorch/XLA PR: #6060
      • PyTorch/XLA PR: #6071
  • [x] timm_efficientdet
    • Issue: #6011
      • PyTorch/XLA PR: #6296
      • PyTorch/XLA PR: #6076
    • Issue: #6010
      • PyTorch/XLA PR: #6060
      • PyTorch/XLA PR: #6071
  • [ ] timm_nfnet
    • Issue: #6003
  • [ ] timm_vision_transformer_large
    • Issue: #6003
  • [x] yolov3
    • Issue: #6010
      • PyTorch/XLA PR: #6060
      • PyTorch/XLA PR: #6071

Dynamo+openxla. Pass rate: 57/99 (57%)

  • [ ] densenet121
    • Issue: #6003
  • [ ] dlrm
    • Issue: #6008
  • [x] hf_BigBird
    • Issue: #5966
      • PyTorch/XLA PR: #6170
  • [x] hf_GPT2
    • PyTorch/XLA PR: #5922
  • [x] hf_GPT2_large
    • PyTorch/XLA PR: #5922
  • [ ] hf_Longformer
    • Issue: #5835
  • [ ] hf_Reformer
    • Issue: #6009
  • [ ] moco
    • Issue: #6083
    • Issue: #6011
      • PyTorch/XLA PR: #6296
      • PyTorch/XLA PR: #6076
    • Issue: #6010
      • PyTorch/XLA PR: #6060
      • PyTorch/XLA PR: #6071
  • [ ] nvidia_deeprecommender
    • Issue: #6084
    • Issue: #6006
      • PyTorch/XLA PR: #6296
      • PyTorch/XLA PR: #6076
  • [x] pytorch_CycleGAN_and_pix2pix
    • Issue: #6007
      • PyTorch/XLA PR: #6296
      • PyTorch/XLA PR: #6076
  • [ ] stable_diffusion_unet
    • Issue: #6003
  • [ ] timm_efficientdet
    • Issue: #6003
    • Issue: #6011
      • PyTorch/XLA PR: #6296
      • PyTorch/XLA PR: #6076
    • Issue: #6010
      • PyTorch/XLA PR: #6060
      • PyTorch/XLA PR: #6071
  • [x] timm_vision_transformer
    • Issue: #6003
  • [x] torch_multimodal_clip
    • Issue: #6005
  • [x] yolov3
    • Issue: #6010
      • PyTorch/XLA PR: #6060
      • PyTorch/XLA PR: #6071

Models also Failing on Inductor

No Training Support on Inductor CUDA

Benchmarks that raise the error: Model's DEFAULT_TRAIN_BSIZE is not implemented.

  • [ ] cm3leon_generate
  • [ ] detectron2_fcos_r_50_fpn
  • [ ] doctr_det_predictor
  • [ ] doctr_reco_predictor
  • [ ] hf_T5_generate
  • [ ] llama
  • [ ] phi_1_5
  • [ ] pyhpc_equation_of_state
  • [ ] pyhpc_isoneutral_mixing
  • [ ] pyhpc_turbulent_kinetic_energy
  • [ ] sam
  • [ ] simple_gpt
  • [ ] simple_gpt_tp_manual

Training Failing on Inductor CUDA with the Same Error

Benchmarks that raise the same error on inductor:

  • [ ] DALLE2_pytorch
    • Issue: #6084
    • Issue: #6010
      • PyTorch/XLA PR: #6060
      • PyTorch/XLA PR: #6071
  • [ ] demucs
    • Issue: #6003
  • [ ] llama_v2_7b_16h
    • Issue: #6003
  • [ ] maml
    • Issue: #6084
  • [ ] timm_vision_transformer_large
    • Issue: #6003
  • [ ] vision_maskrcnn
    • targets should not be none when in training mode
    • Fix https://github.com/pytorch/pytorch/pull/114774

Training Failing on Inductor CUDA with Different Errors

  • [ ] detectron2_fasterrcnn_r_101_c4
    • Issue: #5966
      • PyTorch/XLA PR: #6170
  • [ ] detectron2_fasterrcnn_r_101_dc5
    • Issue: #5966
      • PyTorch/XLA PR: #6170
  • [ ] detectron2_fasterrcnn_r_101_fpn
    • Issue: #5966
      • PyTorch/XLA PR: #6170
  • [ ] detectron2_fasterrcnn_r_50_c4
    • Issue: #5966
      • PyTorch/XLA PR: #6170
  • [ ] detectron2_fasterrcnn_r_50_dc5
    • Issue: #5966
      • PyTorch/XLA PR: #6170
  • [ ] detectron2_fasterrcnn_r_50_fpn
    • Issue: #5966
      • PyTorch/XLA PR: #6170
  • [ ] detectron2_maskrcnn
    • Issue: #5966
      • PyTorch/XLA PR: #6170
  • [ ] detectron2_maskrcnn_r_101_c4
    • Issue: #5966
      • PyTorch/XLA PR: #6170
  • [ ] detectron2_maskrcnn_r_101_fpn
    • Issue: #5966
      • PyTorch/XLA PR: #6170
  • [ ] detectron2_maskrcnn_r_50_c4
    • Issue: #5966
      • PyTorch/XLA PR: #6170
  • [ ] detectron2_maskrcnn_r_50_fpn
    • Issue: #5966
      • PyTorch/XLA PR: #6170
  • [ ] opacus_cifar10
    • Issue: #5967
  • [ ] tacotron2
    • Issue: #6005
    • Issue: #6010
      • PyTorch/XLA PR: #6060
      • PyTorch/XLA PR: #6071

cc @JackCaoG @miladm

ysiraichi avatar Nov 28 '23 20:11 ysiraichi

State after 7 weeks of work:

Models fixed so far:

  • pyhpc_isoneutral_mixing
  • pyhpc_turbulent_kinetic_energy
  • dlrm
  • Super_SloMo
  • speech_transformer

PRs to fix the models. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

  • https://github.com/pytorch/xla/pull/5688
  • https://github.com/pytorch/xla/pull/5689
  • https://github.com/pytorch/xla/pull/5707
  • https://github.com/pytorch/xla/pull/5743
  • https://github.com/pytorch/xla/pull/5769
  • https://github.com/pytorch/xla/pull/5823
  • https://github.com/pytorch/xla/pull/5914
  • https://github.com/pytorch/pytorch/pull/112202
  • https://github.com/pytorch/pytorch/pull/114626
  • https://github.com/pytorch/pytorch/pull/114626
  • https://github.com/pytorch/benchmark/pull/2038

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

  • https://github.com/pytorch/pytorch/pull/114932
  • https://github.com/pytorch/xla/pull/5922
  • https://github.com/pytorch/xla/pull/5960
  • https://github.com/pytorch/xla/pull/5963
  • https://github.com/pytorch/xla/pull/5939
  • https://github.com/pytorch/benchmark/pull/2072

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

  • https://github.com/pytorch/xla/issues/5835
  • https://github.com/pytorch/xla/issues/5837
  • https://github.com/pytorch/xla/issues/5839
  • https://github.com/pytorch/xla/issues/5932
  • https://github.com/pytorch/xla/issues/5942
  • https://github.com/pytorch/pytorch/issues/111033
  • https://github.com/pytorch/pytorch/issues/114302

lezcano avatar Dec 01 '23 11:12 lezcano

Weekly update (Dec 1~Dec 10):

Models fixed:

  • DALLE2_pytorch
    • training is now failing with the same error as inductor
  • stable_diffusion_unet
    • training is still failing with OOM
  • stable_diffusion_text_encoder
  • hf_GPT2
  • hf_GPT2_large
    • training without dynamo is still failing
  • yolov3
    • Failing possibly due to a cuNND error, which is likely an OOM, on a RTX 2060. Haven't tested it yet on a A100, though

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

  • https://github.com/pytorch/xla/pull/5922
  • https://github.com/pytorch/xla/pull/5939
  • https://github.com/pytorch/xla/pull/6060
  • https://github.com/pytorch/xla/pull/6068
  • https://github.com/pytorch/xla/pull/6069
  • https://github.com/pytorch/xla/pull/6071
  • https://github.com/pytorch/benchmark/pull/2072
  • https://github.com/pytorch/pytorch/pull/114932

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

  • https://github.com/pytorch/xla/pull/6076
  • https://github.com/pytorch/xla/pull/6067
  • https://github.com/pytorch/xla/pull/6070
  • https://github.com/pytorch/xla/pull/6072

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

  • https://github.com/pytorch/xla/issues/5966
  • https://github.com/pytorch/xla/issues/5967
  • https://github.com/pytorch/xla/issues/6003
  • https://github.com/pytorch/xla/issues/6004
  • https://github.com/pytorch/xla/issues/6005
  • https://github.com/pytorch/xla/issues/6008
  • https://github.com/pytorch/xla/issues/6009
  • https://github.com/pytorch/xla/issues/6083
  • https://github.com/pytorch/xla/issues/6085
  • https://github.com/pytorch/xla/issues/6086

ysiraichi avatar Dec 11 '23 13:12 ysiraichi

Weekly update (Dec 11~Dec 15):

Models fixed:

  • pytorch_CycleGAN_and_pix2pix
  • nvidia_deeprecommender
    • dynamo+openxla training is still failling
  • simple_gpt and simple_gpt_tp_manual
    • failing due to the same reasons as inductor
  • moco
    • failing due to distributed backend
  • timm_efficientdet
    • dynamo+openxla training is still failing

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

  • https://github.com/pytorch/xla/pull/6072
  • https://github.com/pytorch/xla/pull/6076
  • https://github.com/pytorch/xla/pull/6130
  • https://github.com/pytorch/xla/pull/6153
  • https://github.com/pytorch/xla/pull/6182

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

  • https://github.com/pytorch/xla/pull/6070
  • https://github.com/pytorch/xla/pull/6160
  • https://github.com/pytorch/xla/pull/6170
  • https://github.com/pytorch/xla/pull/6178
  • https://github.com/pytorch/xla/pull/6180
  • https://github.com/pytorch/pytorch/pull/115924

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

  • https://github.com/pytorch/xla/issues/6084
  • https://github.com/pytorch/xla/issues/6112
  • https://github.com/pytorch/pytorch/issues/115900

ysiraichi avatar Dec 15 '23 19:12 ysiraichi

Can we please add a pass rate table in the weekly report that includes:

Inference

  • Inductor, Dynamo+PyTocrh/XLA:GPU, Non-Dynamo+PyTocrh/XLA:GPU

Training

  • Inductor, Dynamo+PyTocrh/XLA:GPU, Non-Dynamo+PyTocrh/XLA:GPU

miladm avatar Jan 10 '24 18:01 miladm

Weekly update (Jan 8 ~ Jan 12):

Pass rate (out of 99 benchmarks):

Inference Training
Inductor 91 64
Non-Dynamo 87 67
Dynamo 86 57

Models fixed:

  • detectron2 models (inference with dynamo)
  • hf_BigBird (inference and training with dynamo)
  • torch_multimodal_clip (training with dynamo)
  • timm_vision_transformer (training with dynamo)
  • Likely not due to the merged PRs below:
    • detectron2 models: all but detectron2_fcos_r_50_fpn (training without dynamo)

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

  • https://github.com/pytorch/pytorch/pull/115924

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

  • https://github.com/pytorch/xla/pull/6302
  • https://github.com/pytorch/xla/pull/6296
  • https://github.com/pytorch/xla/pull/6160
  • https://github.com/pytorch/xla/pull/6070

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

  • https://github.com/pytorch/xla/issues/6292

ysiraichi avatar Jan 16 '24 16:01 ysiraichi

Weekly update (Jan 15 ~ Jan 19):

Pass rate (out of 99 benchmarks):

Inference Training
Inductor 85 62
Non-Dynamo 70 57
Dynamo 71 55

Models that started failing:

  • After #6296:
    • detectron2_fasterrcnn_r_101_c4
    • detectron2_fasterrcnn_r_101_dc5
    • detectron2_fasterrcnn_r_101_fpn
    • detectron2_fasterrcnn_r_50_c4
    • detectron2_fasterrcnn_r_50_dc5
    • detectron2_fasterrcnn_r_50_fpn
    • detectron2_fcos_r_50_fpn
    • detectron2_maskrcnn_r_101_c4
    • detectron2_maskrcnn_r_101_fpn
    • detectron2_maskrcnn_r_50_c4
    • detectron2_maskrcnn_r_50_fpn
    • mobilenet_v3_large
    • timm_regnet
    • hf_Bart
  • Started being skipped:
    • pytorch_CycleGAN_and_pix2pix
    • pytorch_unet
  • Unsupported precision:
    • pytorch_unet
    • yolov3
  • cuDNN error:
    • Super_SloMo (inductor)

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

  • https://github.com/pytorch/xla/pull/6302
  • https://github.com/pytorch/xla/pull/6296
  • https://github.com/pytorch/xla/pull/6325

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

  • https://github.com/pytorch/xla/pull/6160
  • https://github.com/pytorch/xla/pull/6070

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

  • https://github.com/pytorch/xla/issues/6336

ysiraichi avatar Jan 23 '24 15:01 ysiraichi

Can we track separate passrate tables for L4 and A100 GPUs going forward @ysiraichi?

cc @frgossen @golechwierowicz @cota

miladm avatar Jan 23 '24 16:01 miladm

Weekly update (Jan 22 ~ Jan 26):

Pass rate (out of 99 benchmarks):

Inference Training
Inductor 88 63
Non-Dynamo 69 57
Dynamo 72 55

Models fixed:

  • (inductor) moco
  • (inductor) Super_SloMo
    • Failed when executed with all other benchmarks
    • Passed when executed alone (by specifying --filter argument)
  • (inference) llama_v2_7b_16h

Models that started failing:

  • (inference + non-dynamo) timm_efficientnet (to be fixed by: #6389)
  • (inference + non-dynamo) timm_nfnet (to be fixed by: #6389)

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

  • https://github.com/pytorch/xla/pull/6350
  • https://github.com/pytorch/xla/pull/6374
  • https://github.com/pytorch/xla/pull/6375
  • https://github.com/pytorch/benchmark/pull/2124
  • https://github.com/pytorch/pytorch/pull/118032

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

  • https://github.com/pytorch/xla/pull/6389
  • https://github.com/pytorch/xla/pull/6160
  • https://github.com/pytorch/xla/pull/6070

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

  • https://github.com/pytorch/xla/issues/6348
  • https://github.com/pytorch/xla/issues/6353
  • https://github.com/pytorch/xla/issues/6366
  • https://github.com/pytorch/xla/issues/6367
  • https://github.com/pytorch/xla/issues/6380
  • https://github.com/pytorch/xla/issues/6391

ysiraichi avatar Jan 29 '24 13:01 ysiraichi

Weekly update (Jan 29 ~ Feb 2):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 87 (last: 88) 63
Non-Dynamo 82 (last: 69) 56 (last: 57)
Dynamo 82 (last: 72) 53 (last: 55)

L4

Inference Training
Inductor 86 60
Non-Dynamo 81 53
Dynamo 82 49

Models Summary (for A100)

  • Inductor: Inference (-4, +3)
    • (fail) New skips by PyTorch's torchbench skip list:
      • detectron2_maskrcnn
      • hf_Bert
      • hf_Bert_large
      • maml
    • (pass) Remove outdated skip:
      • vision_maskrcnn
    • (pass) AMP supported:
      • pytorch_unet
      • yolov3
  • Inductor: Training (-3, +3)
    • (fail) New skips by PyTorch's torchbench skip list:
      • hf_Bert
      • hf_Bert_large
    • (fail) Failing due to sparse error:
      • dlrm
    • (pass) AMP supported:
      • pytorch_unet
    • (pass) No OOM:
      • demucs
      • opacus_cifar10
  • XLA:GPU (non-dynamo): Inference (-3, +16)
    • (fail) New skips by PyTorch's torchbench skip list:
      • detectron2_maskrcnn
      • hf_Bert
      • hf_Bert_large
    • (pass) Forcing fp32 precision (while setting XLA_USE_FP16):
      • detectron2 benchmarks (11)
      • mobilenet_v3_large
      • timm_efficientnet
      • timm_nfnet
      • timm_regnet
    • (pass) AMP supported:
      • yolov3
  • XLA:GPU (non-dynamo): Training (-2, +1)
    • (fail) New skips by PyTorch's torchbench skip list:
      • hf_Bert
      • hf_Bert_large
    • (pass) No OOM:
      • hf_GPT2_large
  • XLA:GPU (dynamo): Inference (-4, +14)
    • (fail) New skips by PyTorch's torchbench skip list:
      • detectron2_maskrcnn
      • hf_Bert
      • hf_Bert_large
      • maml
    • (pass) Remove outdated skip:
      • vision_maskrcnn
    • (pass) Forcing fp32 precision (while setting XLA_USE_FP16):
      • detectron2 benchmarks (11)
      • hf_Bart
    • (pass) AMP supported:
      • yolov3
  • XLA:GPU (dynamo): Training (-2, +0)
    • (fail) New skips by PyTorch's torchbench skip list:
      • hf_Bert
      • hf_Bert_large

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

  • https://github.com/pytorch/xla/pull/6070
  • https://github.com/pytorch/xla/pull/6160
  • https://github.com/pytorch/xla/pull/6389
  • https://github.com/pytorch/xla/pull/6402
  • https://github.com/pytorch/xla/pull/6407
  • https://github.com/pytorch/xla/pull/6416
  • https://github.com/pytorch/xla/pull/6419
  • https://github.com/pytorch/xla/pull/6421
  • https://github.com/pytorch/xla/pull/6446
  • https://github.com/pytorch/xla/pull/6447

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

  • https://github.com/pytorch/pytorch/pull/118783

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

  • https://github.com/pytorch/xla/issues/6403
  • https://github.com/pytorch/xla/issues/6404

ysiraichi avatar Feb 05 '24 14:02 ysiraichi

Weekly update (Feb 5 ~ Feb 9):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 87 (last: 87) 63
Non-Dynamo 82 (last: 82) 57 (last: 56)
Dynamo 84 (last: 82) 53 (last: 53)

L4

Inference Training
Inductor 86 60
Non-Dynamo 81 53
Dynamo 84 49

Models Summary

  • XLA:GPU (non-dynamo): Training (0, +1)
    • (pass) No OOM:
      • densenet121
  • XLA:GPU (dynamo): Inference (0, +2)
    • (pass) Increased compilation cache:
      • cm3leon_generate
      • hf_T5_generate

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

  • https://github.com/pytorch/xla/pull/6484
  • https://github.com/pytorch/xla/pull/6491
  • https://github.com/pytorch/xla/pull/6509
  • https://github.com/pytorch/xla/pull/6512
  • https://github.com/pytorch/pytorch/pull/118783

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

  • https://github.com/pytorch/xla/pull/6518

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

  • https://github.com/pytorch/xla/issues/6483
  • https://github.com/pytorch/xla/issues/6511
  • https://github.com/pytorch/pytorch/issues/119680

ysiraichi avatar Feb 12 '24 13:02 ysiraichi

Weekly update (Feb 12 ~ Feb 16):

Pass rate (out of 99 benchmarks):

Could not run the benchmarks this time, due to a compilation issue: #6564


PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

  • https://github.com/pytorch/xla/pull/6518
  • https://github.com/pytorch/xla/pull/6558
  • https://github.com/pytorch/xla/pull/6550

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

  • https://github.com/pytorch/xla/pull/6542
  • https://github.com/pytorch/pytorch/pull/120117

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

  • https://github.com/pytorch/xla/issues/6520
  • https://github.com/pytorch/xla/issues/6521
  • https://github.com/pytorch/xla/issues/6540
  • https://github.com/pytorch/xla/issues/6521
  • https://github.com/pytorch/xla/issues/6556
  • https://github.com/pytorch/xla/issues/6557
  • https://github.com/pytorch/xla/issues/6564
  • https://github.com/pytorch/pytorch/issues/120115

ysiraichi avatar Feb 19 '24 14:02 ysiraichi

Weekly update (Feb 19 ~ Feb 23):

Pass rate (out of 99 benchmarks):

There was an error in the benchmarking scripts, making it so we were unable to run using XLA: https://github.com/pytorch/xla/pull/6612


PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

  • https://github.com/pytorch/xla/pull/6597
  • https://github.com/pytorch/pytorch/pull/120117
  • https://github.com/pytorch/pytorch/pull/120299

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

  • https://github.com/pytorch/xla/pull/6542
  • https://github.com/pytorch/xla/pull/6612
  • https://github.com/pytorch/pytorch/pull/120435

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

  • https://github.com/pytorch/pytorch/issues/120336
  • https://github.com/pytorch/pytorch/issues/120585

ysiraichi avatar Feb 26 '24 14:02 ysiraichi

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 81 (last: 87) 65 (last: 63)
Non-Dynamo 72 (last: 82) 61 (last: 57)
Dynamo 73 (last: 84) 54 (last: 53)

L4

Inference Training
Inductor 81 (last: 86) 62 (last: 60)
Non-Dynamo 71 (last: 81) 57 (last: 53)
Dynamo 73 (last: 84) 52 (last: 49)

Models Summary

  • Inductor: Inference (-10, +4)

    • (fail) "roi_align_forward_kernel" not implemented for 'BFloat16' (after: #6518)
      • detectron2 benchmarks (10)
    • (pass) Remove outdated skips
      • hf_Bert and hf_Bert_large
      • maml
      • pytorch_CycleGAN_and_pix2pix
  • Inductor: Training (-3, +5)

    • (fail) Running on AMP (after: #6518)
      • mobilenet_v2_quantized_qat
      • resnet50_quantized_qat
    • (pass) Remove outdated skips
      • hf_Bert and hf_Bert_large
      • pytorch_CycleGAN_and_pix2pix
  • XLA:GPU (non-dynamo): Inference (-15, +5)

    • (fail) Error while lowering: aten::upsample_bilinear2d (after: #6518) (issue: #6520)
      • Background_Matting
    • (fail) CPU fallback does not work with mixed dtypes (issue: #6336)
      • detectron2 benchmarks (11)
    • (fail) Seen floating point types of different precisions in HLO (after: #6518) (issue: #6521)
      • hf_GPT2 and hf_GPT2_large
    • (fail) Indices types are not Long (they are Int) (after: #6518) (issue: #6648)
      • llama
    • (pass) Remove outdated skips
      • hf_Bert and hf_Bert_large
      • maml
      • pytorch_CycleGAN_and_pix2pix
      • pytorch_unet
  • XLA:GPU (non-dynamo): Training (0, +4)

    • (pass) Remove outdated skips
      • hf_Bert and hf_Bert_large
      • pytorch_CycleGAN_and_pix2pix
      • pytorch_unet
  • XLA:GPU (dynamo): Inference (-16, +5)

    • (fail) expected scalar type Float but found Half (after: #6518) (issue: #6556)
      • Super_SloMo
    • (fail) CPU fallback does not work with mixed dtypes (issue: #6336)
      • detectron2 benchmarks (11)
    • (fail) Seen floating point types of different precisions in HLO (after: #6518) (issue: #6521)
      • hf_GPT2 and hf_GPT2_large
    • (fail) Indices types are not Long (they are Int) (after: #6518) (issue: #6648)
      • llama
    • (fail) Slice size at index 0 in gather op is out of range, must be within [0, 1), got 1. (issue: #6557)
      • vision_maskrcnn
  • XLA:GPU (dynamo): Training (-4, +5)

    • (fail) expected scalar type Float but found Half (after: #6518) (issue: #6556)
      • Super_SloMo
    • (fail) Seen floating point types of different precisions in HLO (after: #6518)
      • hf_GPT2 and hf_GPT2_large (issue: #6521)
      • timm_nfnet (issue: #6649)
    • (pass) Remove outdated skips
      • hf_Bert and hf_Bert_large
      • pytorch_CycleGAN_and_pix2pix
      • pytorch_unet
    • (pass) No OOM
      • stable_diffusion_unet

ysiraichi avatar Feb 27 '24 14:02 ysiraichi

Weekly update (Feb 26 ~ Mar 01):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 81 (last: 81) 65 (last: 65)
Non-Dynamo 72 (last: 72) 61 (last: 61)
Dynamo 73 (last: 73) 56 (last: 54)

L4

Inference Training
Inductor 81 (last: 81) 63 (last: 62)
Non-Dynamo 72 (last: 71) 58 (last: 57)
Dynamo 71 (last: 73) 54 (last: 52)

Models Summary

  • XLA:GPU (non-dynamo): Training (-1, +1)

    • (fail) Timeout:
      • timm_efficientdet
    • (pass) Smaller batch size
      • demucs
  • XLA:GPU (dynamo): Inference (-2, 0)

    • (fail) Timeout:
      • cm3leon_generate
      • hf_T5_generate
  • XLA:GPU (dynamo): Training (0, +2)

    • (pass) Smaller batch size
      • densenet121
      • timm_efficientdet

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

  • https://github.com/pytorch/xla/pull/6542
  • https://github.com/pytorch/xla/pull/6612
  • https://github.com/pytorch/xla/pull/6632

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

  • https://github.com/pytorch/xla/pull/6659
  • https://github.com/pytorch/xla/pull/6624
  • https://github.com/pytorch/xla/pull/6661
  • https://github.com/pytorch/pytorch/pull/120435
  • https://github.com/pytorch/pytorch/pull/121007
  • https://github.com/pytorch/pytorch/pull/121074
  • https://github.com/pytorch/pytorch/pull/121075

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

  • https://github.com/pytorch/xla/pull/6648
  • https://github.com/pytorch/xla/pull/6649

ysiraichi avatar Mar 04 '24 14:03 ysiraichi

Weekly update (Mar 04 ~ Mar 08):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 81 (last: 81) 66 (last: 65)
Non-Dynamo 72 (last: 72) 61 (last: 61)
Dynamo 71 (last: 71) 57 (last: 56)

L4

Inference Training
Inductor 81 (last: 81) 64 (last: 63)
Non-Dynamo 72 (last: 72) 58 (last: 58)
Dynamo 71 (last: 71) 55 (last: 54)

Models Summary (A100)

  • Inductor: Training (0, +1)

    • (pass) Reason unknown
      • dlrm
  • XLA:GPU (dynamo): Training (0, +1)

    • (pass) Tensor.new dynamo support
      • hf_Reformer

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

  • https://github.com/pytorch/xla/pull/6624
  • https://github.com/pytorch/pytorch/pull/121075

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

  • https://github.com/pytorch/xla/pull/6659
  • https://github.com/pytorch/xla/pull/6661
  • https://github.com/pytorch/xla/pull/6697
  • https://github.com/pytorch/pytorch/pull/121007

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi avatar Mar 11 '24 14:03 ysiraichi

Weekly update (Mar 11 ~ Mar 15):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 81 (last: 81) 66 (last: 66)
Non-Dynamo 37 (last: 72) 28 (last: 61)
Dynamo 31 (last: 71) 18 (last: 57)

L4

Inference Training
Inductor 81 (last: 81) 64 (last: 63)
Non-Dynamo 45 (last: 72) 38 (last: 58)
Dynamo 44 (last: 71) 22 (last: 55)

Models Summary (A100)

No summary this week because:

  • Diff is too big
  • It might be due to a pin update

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

  • https://github.com/pytorch/xla/pull/6718
  • https://github.com/pytorch/xla/pull/6745
  • https://github.com/pytorch/xla/pull/6697

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

  • https://github.com/pytorch/xla/pull/6659
  • https://github.com/pytorch/xla/pull/6661
  • https://github.com/pytorch/pytorch/pull/121007

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

  • https://github.com/pytorch/xla/pull/6750
  • https://github.com/pytorch/pytorch/pull/121926

ysiraichi avatar Mar 19 '24 19:03 ysiraichi

@ysiraichi The regression you saw might be due to https://github.com/pytorch/xla/pull/6677 (open xla pin update). Our team is looking into this issue.

vanbasten23 avatar Mar 21 '24 16:03 vanbasten23

Weekly update (Mar 18 ~ Mar 21):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 81 (last: 81) 66 (last: 66)
Non-Dynamo 76 (last: 72) 64 (last: 61)
Dynamo 73 (last: 71) 58 (last: 57)

L4

Inference Training
Inductor 80 (last: 81) 64 (last: 64)
Non-Dynamo 76 (last: 72) 61 (last: 58)
Dynamo 74 (last: 71) 56 (last: 55)

Models Summary (A100)

  • XLA:GPU (non-dynamo): Inference (0, +4)

    • (pass) as_strided_copy new implementation
      • hf_Longformer
    • (pass) pow data-type promotion fixed
      • hf_GPT2
      • hf_GPT2_large
    • (pass) Loosen Embedding index type requirement
      • llama
  • XLA:GPU (non-dynamo): Training (0, +3)

    • (pass) as_strided_copy new implementation
      • hf_Longformer
    • (pass) Unknown reason:
      • hf_T5_base
      • timm_efficientdet
  • XLA:GPU (dynamo): Inference (-2, +4)

    • (pass) as_strided_copy new implementation
      • hf_Longformer
    • (pass) pow data-type promotion fixed
      • hf_GPT2
      • hf_GPT2_large
    • (pass) Loosen Embedding index type requirement
      • llama
    • (fail) Unknown reason:
      • doctr_reco_predictor https://github.com/pytorch/xla/issues/6832
      • speech_transformer https://github.com/pytorch/xla/issues/6831
  • XLA:GPU (dynamo): Training (-2, +3)

    • (pass) as_strided_copy new implementation
      • hf_Longformer
    • (pass) pow data-type promotion fixed
      • hf_GPT2
      • hf_GPT2_large
    • (fail) Unknown reason:
      • hf_Reformer https://github.com/pytorch/xla/issues/6830
      • speech_transformer https://github.com/pytorch/xla/issues/6831

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

  • https://github.com/pytorch/xla/pull/6659
  • https://github.com/pytorch/xla/pull/6661
  • https://github.com/pytorch/xla/pull/6814
  • https://github.com/pytorch/pytorch/pull/121007

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi avatar Mar 25 '24 15:03 ysiraichi

Last week, the results were unchanged. We are preparing for performance optimizations. cc @ysiraichi

miladm avatar Apr 01 '24 16:04 miladm

Weekly update (Apr 1 ~ Apr 5):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 81 (last: 81) 66 (last: 66)
Non-Dynamo 75 (last: 76) 63 (last: 64)
Dynamo 73 (last: 73) 53 (last: 58)

L4

Inference Training
Inductor 82 (last: 80) 65 (last: 64)
Non-Dynamo 75 (last: 76) 61 (last: 61)
Dynamo 74 (last: 74) 51 (last: 56)

Models Summary (A100)

  • Inductor: Inference (-1, +1)

    • (pass) dlrm
    • (fail) maml
  • XLA:GPU (non-dynamo): Inference (-1, 0)

    • (fail) timm_efficientdet https://github.com/pytorch/xla/issues/6889
  • XLA:GPU (non-dynamo): Training (-1, 0)

    • (fail) timm_efficientdet: OOM
  • XLA:GPU (dynamo): Inference (-1, +1)

    • (pass) speech_transformer
    • (fail) timm_efficientdet https://github.com/pytorch/xla/issues/6899
  • XLA:GPU (dynamo): Training (-7, +2)

    • (pass) hf_Reformer and speech_transformer
    • (fail) hf_GPT2 and hf_GPT2_large https://github.com/pytorch/xla/issues/6900
    • (fail) hf_T5, hf_T5_base, stable_diffusion_unet, and timm_vision_transformer_large: OOM
    • (fail) hf_T5_large https://github.com/pytorch/xla/issues/6901

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

  • https://github.com/pytorch/xla/pull/6814
  • https://github.com/pytorch/xla/pull/6881

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

  • https://github.com/pytorch/xla/pull/6659
  • https://github.com/pytorch/xla/pull/6661
  • https://github.com/pytorch/pytorch/pull/121007

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

  • https://github.com/pytorch/xla/pull/6833
  • https://github.com/pytorch/xla/pull/6899
  • https://github.com/pytorch/xla/pull/6900
  • https://github.com/pytorch/xla/pull/6901

ysiraichi avatar Apr 08 '24 15:04 ysiraichi

Weekly update (Apr 8 ~ Apr 12):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 81 (last: 81) 66 (last: 66)
Non-Dynamo 74 (last: 75) 64 (last: 63)
Dynamo 74 (last: 73) 53 (last: 53)

L4

Inference Training
Inductor 82 (last: 82) 65 (last: 65)
Non-Dynamo 75 (last: 75) 61 (last: 61)
Dynamo 75 (last: 74) 51 (last: 51)

Models Summary (A100)

  • XLA:GPU (non-dynamo): Inference (-1, 0)

    • (fail) doctr_reco_predictor: TIMEOUT
  • XLA:GPU (non-dynamo): Training (0, +1)

    • (pass) timm_efficientdet
  • XLA:GPU (dynamo): Inference (0, +1)

    • (pass) hf_Reformer

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

  • https://github.com/pytorch/xla/pull/6659
  • https://github.com/pytorch/xla/pull/6661
  • https://github.com/pytorch/pytorch/pull/121007

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi avatar Apr 15 '24 14:04 ysiraichi

Weekly update (Apr 15 ~ Apr 19):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor ? (last: 81) ? (last: 66)
Non-Dynamo ? (last: 74) ? (last: 64)
Dynamo ? (last: 74) ? (last: 53)

L4

Inference Training
Inductor 82 (last: 82) 65 (last: 65)
Non-Dynamo 76 (last: 75) 61 (last: 61)
Dynamo 76 (last: 75) 51 (last: 51)

Models Summary (A100)


PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

  • https://github.com/pytorch/xla/pull/6933

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi avatar Apr 22 '24 15:04 ysiraichi

Weekly update (Apr 22 ~ Apr 26):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 81 (last: 81) 66 (last: 66)
Non-Dynamo 75 (last: 74) 64 (last: 64)
Dynamo 75 (last: 74) 53 (last: 53)

L4

Inference Training
Inductor 81 (last: 82) 65 (last: 65)
Non-Dynamo 76 (last: 76) 61 (last: 61)
Dynamo 76 (last: 76) 51 (last: 51)

Models Summary (A100)

  • XLA:GPU (non-dynamo): Inference (0, +1)

    • (pass) timm_efficientdet
  • XLA:GPU (dynamo): Inference (0, +1)

    • (pass) timm_efficientdet

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

  • https://github.com/pytorch/xla/pull/6958

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

  • https://github.com/pytorch/xla/issues/6988

ysiraichi avatar Apr 28 '24 14:04 ysiraichi

Weekly update (Apr 29 ~ May 3):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 81 (last: 81) 66 (last: 66)
Non-Dynamo 76 (last: 75) 64 (last: 64)
Dynamo 75 (last: 75) 53 (last: 53)

L4

Inference Training
Inductor 82 (last: 81) 65 (last: 65)
Non-Dynamo 76 (last: 76) 61 (last: 61)
Dynamo 76 (last: 76) 51 (last: 51)

Models Summary (A100)

  • XLA:GPU (non-dynamo): Inference (0, +1)
    • (pass) doctr_reco_predictor

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

  • https://github.com/pytorch/xla/pull/6958

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi avatar May 04 '24 18:05 ysiraichi

Weekly update (May 6 ~ May 10):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 82 (last: 81) 66 (last: 66)
Non-Dynamo 76 (last: 75) 64 (last: 64)
Dynamo 75 (last: 75) 53 (last: 53)

L4

Inference Training
Inductor 82 (last: 82) 65 (last: 65)
Non-Dynamo 76 (last: 76) 61 (last: 61)
Dynamo 76 (last: 76) 51 (last: 51)

Notes

  • Inductor on L4 started failing with: SyntaxError: unterminated string literal
    • Oddly enough, A100 didn't have the same error
    • Didn't update the results of L4

Models Summary (A100)

  • Inductor: Inference (0, +1)
    • (pass) maml

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

  • https://github.com/pytorch/pytorch/pull/125876

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi avatar May 13 '24 16:05 ysiraichi

Weekly update (May 13 ~ May 17):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 82 (last: 82) 66 (last: 66)
Non-Dynamo 77 (last: 76) 61 (last: 64)
Dynamo 78 (last: 75) 55 (last: 53)

L4

Inference Training
Inductor 82 (last: 82) 65 (last: 65)
Non-Dynamo 77 (last: 76) 59 (last: 61)
Dynamo 78 (last: 76) 52 (last: 51)

Models Summary (A100)

All the difference shown bellow is likely the result of #7067, which fixes AMP. Reason: (i) training benchmarks use AMP, by default; and (ii) there are some inference benchmarks that use AMP instead of bfloat16.

  • XLA:GPU (non-dynamo): Inference (0, +1)

    • (pass) detectron2_fcos_r_50_fpn
  • XLA:GPU (non-dynamo): Training (-5, +2)

    • (fail) Super_SloMo
    • (fail) mobilenet_v2_quantized_qat
    • (fail) resnet50_quantized_qat
    • (fail) timm_efficientdet
    • (fail) timm_nfnet
    • (pass) stable_diffusion_unet
    • (pass) timm_vision_transformer_large
  • XLA:GPU (dynamo): Inference (0, +3)

    • (pass) Super_SloMo
    • (pass) detectron2_fcos_r_50_fpn
    • (pass) doctr_reco_predictor
  • XLA:GPU (dynamo): Training (0, +2)

    • (pass) Super_SloMo
    • (pass) timm_nfnet

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

  • https://github.com/pytorch/xla/pull/7067

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

  • https://github.com/pytorch/xla/pull/7080
  • https://github.com/pytorch/xla/pull/7081

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi avatar May 20 '24 15:05 ysiraichi

Weekly update (May 20 ~ May 24):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 82 (last: 82) 66 (last: 66)
Non-Dynamo 77 (last: 77) 63 (last: 61)
Dynamo 78 (last: 78) 55 (last: 55)

L4

Inference Training
Inductor 82 (last: 82) 65 (last: 65)
Non-Dynamo 77 (last: 77) 61 (last: 59)
Dynamo 78 (last: 78) 52 (last: 52)

Models Summary (A100)

  • XLA:GPU (non-dynamo): Training (-5, +2)
    • (pass) Super_SloMo #7067
    • (pass) timm_efficientdet #7091

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

  • https://github.com/pytorch/xla/pull/7080
  • https://github.com/pytorch/xla/pull/7081
  • https://github.com/pytorch/xla/pull/7090
  • https://github.com/pytorch/xla/pull/7091

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

  • https://github.com/pytorch/xla/pull/7111
  • https://github.com/pytorch/xla/pull/7113
  • https://github.com/pytorch/xla/pull/7116

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

  • https://github.com/pytorch/xla/issues/7095

ysiraichi avatar May 25 '24 00:05 ysiraichi

Weekly update (May 27 ~ May 29):

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

  • https://github.com/pytorch/xla/pull/7111
  • https://github.com/pytorch/xla/pull/7113
  • https://github.com/pytorch/xla/pull/7116
  • https://github.com/pytorch/xla/pull/7130

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

  • https://github.com/pytorch/xla/pull/7168

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi avatar Jun 03 '24 14:06 ysiraichi

Weekly update (June 3 ~ June 6):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 82 (last: 82) 65 (last: 66)
Non-Dynamo 79 (last: 77) 61 (last: 63)
Dynamo 79 (last: 78) 55 (last: 55)

L4

Inference Training
Inductor 82 (last: 82) 64 (last: 65)
Non-Dynamo 79 (last: 77) 60 (last: 61)
Dynamo 79 (last: 78) 52 (last: 52)

Models Summary (A100)

  • Inductor: Training (-1, +0)

    • (fail) dlrm
  • XLA:GPU (non-dynamo): Inference (-0, +2)

  • XLA:GPU (non-dynamo): Training (-3, +1)

    • (pass) timm_nfnet #7130
    • (fail) drq #7247
    • (fail) stable_diffusion_unet: OOM
    • (fail) timm_vision_transformer_large: OOM
  • XLA:GPU (dynamo): Inference (-0, +1)


PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

  • https://github.com/pytorch/xla/pull/7168

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

  • https://github.com/pytorch/xla/issues/7198
  • https://github.com/pytorch/pytorch/issues/128165

ysiraichi avatar Jun 10 '24 15:06 ysiraichi

Weekly update (June 10 ~ June 14):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 82 (last: 82) 65 (last: 65)
Non-Dynamo 79 (last: 79) 63 (last: 61)
Dynamo 79 (last: 79) 55 (last: 55)

L4

Inference Training
Inductor 82 (last: 82) 64 (last: 64)
Non-Dynamo 79 (last: 79) 61 (last: 60)
Dynamo 79 (last: 79) 52 (last: 52)

Models Summary (A100)

  • XLA:GPU (non-dynamo): Training (-1, +3)
    • (pass) drq
    • (pass) stable_diffusion_unet
    • (pass) timm_vision_transformer_large
    • (fail) timm_nfnet #7271

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

  • https://github.com/pytorch/xla/pull/7257
  • https://github.com/pytorch/benchmark/pull/2292

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

ysiraichi avatar Jun 17 '24 14:06 ysiraichi