benchmark icon indicating copy to clipboard operation
benchmark copied to clipboard

TorchBench is a collection of open source benchmarks used to evaluate PyTorch performance.

Results 314 benchmark issues
Sort by recently updated
recently updated
newest added

We are now in a state where the existing models on main branch is fairly stable, so we can run all models on the main branch instead of a v*...

------------------------------------------- Enabling Unet1d to track OP performance:conv_transpose1d. ------------------------------------------- This PR is to enable Huggingface/Diffusers/UNET1D which includes conv_transpose1d on upsampling and downsampling phase. (In [#120982,](https://github.com/pytorch/pytorch/issues/120982) a remarkable performance regression happens to...

cla signed

Summary: without it, each run will purge svg files when previewed locally. no affect by default. Reviewed By: Yuzhen11 Differential Revision: D55453161

cla signed
fb-exported

Test workflow: https://github.com/pytorch/benchmark/actions/runs/9793666962

cla signed

Show more help messages for Tritonbench ``` usage: run_benchmark.py [-h] [--op OP] [--mode {fwd,bwd,fwd_bwd}] [--bwd] [--fwd_bwd] [--device DEVICE] [--warmup WARMUP] [--iter ITER] [--csv] [--dump-csv] [--skip-print] [--plot] [--ci] [--metrics METRICS] [--only...

cla signed

Make the version used in https://github.com/pytorch/test-infra/blob/main/.github/actions/setup-nvidia/action.yml and https://github.com/pytorch/benchmark/blob/main/docker/infra/daemonset.yaml#L73 consistent

ARC Kubernetes mode has the highest security level among all runner modes. We can keep using CPU-only instance for docker building, but all GPU instances should use ARC Kubernetes mode....

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #2349 * __->__ #2348 Overall context: Before looking further into the bf16xint4 matmul, I'm planning to look into a bf16xint16 matmul first. The...

cla signed

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #2349 * #2348 Modify the bf16xint16 kernel from the previous PR to actually do as intended: load the int16 input, convert it...

cla signed