benchmark issues

V2 Performance Signal Detected by TorchBench CI on '2.5.0.dev20240821+cu124'

TorchBench CI has detected a performance signal. Base PyTorch version: 2.4.0.dev20240606+cu121 Base PyTorch commit: ffaea656b5d8ff6518669494cc8f664b94f8e8b1 Affected PyTorch version: 2.5.0.dev20240821+cu124 Affected PyTorch commit: c42ac54d9e817bf0a0366eb78e6c8beba4d5eff5 Affected Tests: - test_train[Super_SloMo-cuda-eager]: +14.63890% - test_eval[Super_SloMo-cuda-eager]:...

xuzhao9

torchbench-perf-report

V2 Performance Signal Detected by TorchBench CI on '2.5.0.dev20240820+cu124'

TorchBench CI has detected a performance signal. Base PyTorch version: 2.4.0.dev20240606+cu121 Base PyTorch commit: ffaea656b5d8ff6518669494cc8f664b94f8e8b1 Affected PyTorch version: 2.5.0.dev20240820+cu124 Affected PyTorch commit: 92151c814ba715fe7d1f5648b0ae6950dceee6b7 Affected Tests: - test_train[Super_SloMo-cuda-eager]: +14.63489% - test_eval[Super_SloMo-cuda-eager]:...

xuzhao9

torchbench-perf-report

Log PT2 chromium events to scuba

10

Summary: X-link: https://github.com/pytorch/pytorch/pull/133859 This diff implements a bunch of views for internal scuba viewing. TODOS that I might punt to another diff: - Saving cache stats via counter is definitely...

jamesjwu

cla signed

fb-exported

[tritonbench] A better way to encode input shapes with parameters

Right now developers need to manually specify the input shape metadata to generate the input tensors. I am thinking if we could use decorators to make this process easier and...

xuzhao9

sam_fast model memory leak

4

https://github.com/pytorch/benchmark/actions/runs/10168992530/job/28133362873 good: 2.5.0.dev20240729+cu124 (500aea8d5033fd3540c6ed325dd80e7e1420b0f3) bad: torch: 2.5.0.dev20240730+cu124 (05a8540041cea936a63355c2e38b7b3beb5ce168) bisect userbenchmark: test_bench arguments: -m sam_fast -t eval --memleak Bisection workflow: https://github.com/pytorch/benchmark/actions/runs/10173197195

xuzhao9

Persistent version of Flash Attention

Added two more variants: triton_tutorial_flash_v2_persistent and triton_tutorial_flash_v2_persistent_tma The variants handle non-causal only. For causal, it has 2 invocations to attn_fwd_inner, which means we will have an outerloop and 2 inner...

manman-ren

cla signed

[tritonbench] Add +/- variance to the latency metric

We want to get both median and +/- variance metrics to latency in the output table.

xuzhao9

Replace runners prefix amz2023.

testing new runners

jeanschmidt

cla signed

Refactor the t4 and a100 nightly workflows

- [ ] Design the PT2 Benchmark Runner based new pytorch nightly testing workflows (T4 and A100) - [ ] Improve the bisection workflow to support PT2 Benchmark Runner -...

xuzhao9

benchmark
benchmark copied to clipboard

Metadata

V2 Performance Signal Detected by TorchBench CI on '2.5.0.dev20240821+cu124'

V2 Performance Signal Detected by TorchBench CI on '2.5.0.dev20240820+cu124'

Log PT2 chromium events to scuba

[tritonbench] A better way to encode input shapes with parameters

sam_fast model memory leak

Persistent version of Flash Attention

[tritonbench] Add +/- variance to the latency metric

Replace runners prefix amz2023.

Refactor the t4 and a100 nightly workflows

← Metadata

Owner

Metadata

benchmark benchmark copied to clipboard

Metadata

← Metadata

Owner

Metadata

benchmark
benchmark copied to clipboard