benchmark icon indicating copy to clipboard operation
benchmark copied to clipboard

TorchBench is a collection of open source benchmarks used to evaluate PyTorch performance.

Results 259 benchmark issues
Sort by recently updated
recently updated
newest added

TorchBench CI has detected a performance signal. Base PyTorch version: 2.4.0.dev20240606+cu121 Base PyTorch commit: ffaea656b5d8ff6518669494cc8f664b94f8e8b1 Affected PyTorch version: 2.5.0.dev20240821+cu124 Affected PyTorch commit: c42ac54d9e817bf0a0366eb78e6c8beba4d5eff5 Affected Tests: - test_train[Super_SloMo-cuda-eager]: +14.63890% - test_eval[Super_SloMo-cuda-eager]:...

torchbench-perf-report

TorchBench CI has detected a performance signal. Base PyTorch version: 2.4.0.dev20240606+cu121 Base PyTorch commit: ffaea656b5d8ff6518669494cc8f664b94f8e8b1 Affected PyTorch version: 2.5.0.dev20240820+cu124 Affected PyTorch commit: 92151c814ba715fe7d1f5648b0ae6950dceee6b7 Affected Tests: - test_train[Super_SloMo-cuda-eager]: +14.63489% - test_eval[Super_SloMo-cuda-eager]:...

torchbench-perf-report

Summary: X-link: https://github.com/pytorch/pytorch/pull/133859 This diff implements a bunch of views for internal scuba viewing. TODOS that I might punt to another diff: - Saving cache stats via counter is definitely...

cla signed
fb-exported

Right now developers need to manually specify the input shape metadata to generate the input tensors. I am thinking if we could use decorators to make this process easier and...

https://github.com/pytorch/benchmark/actions/runs/10168992530/job/28133362873 good: 2.5.0.dev20240729+cu124 (500aea8d5033fd3540c6ed325dd80e7e1420b0f3) bad: torch: 2.5.0.dev20240730+cu124 (05a8540041cea936a63355c2e38b7b3beb5ce168) bisect userbenchmark: test_bench arguments: -m sam_fast -t eval --memleak Bisection workflow: https://github.com/pytorch/benchmark/actions/runs/10173197195

Added two more variants: triton_tutorial_flash_v2_persistent and triton_tutorial_flash_v2_persistent_tma The variants handle non-causal only. For causal, it has 2 invocations to attn_fwd_inner, which means we will have an outerloop and 2 inner...

cla signed

We want to get both median and +/- variance metrics to latency in the output table.

- [ ] Design the PT2 Benchmark Runner based new pytorch nightly testing workflows (T4 and A100) - [ ] Improve the bisection workflow to support PT2 Benchmark Runner -...