benchmark issues

TFLOPS calculation for TF32

The current TFLOPS is only for FP32. Need to add support for other floating point formats such as TF32 and FP16.

bug for DCGM fp32 metric collection

When there are multiple metrics, the DCGM API may return `-1` values for some metrics. In general, most metrics use `max` as the default aggregator function while our FP32 metric...

FindHao

bug

Move the reference model (for train correctness checks) to cpu before starting tests

Stack from [ghstack](https://github.com/ezyang/ghstack): * __->__ #1275 On some OSS models we see CUDA OOM if we enable train correctness checks. For certain models, we can prevent this OOM by copying...

davidberard98

cla signed

CVE-2007-4559 Patch

1

# Patching CVE-2007-4559 Hi, we are security researchers from the Advanced Research Center at [Trellix](https://www.trellix.com). We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a...

TrellixVulnTeam

torchbench: torch.no_grad() not working for dynamo inductor backend

4

torch dynamo inductor backend is not seeing the` torch.no_grad()` setting from torchbench framework. This is the reproducer `python run_benchmark.py cpu --model hf_Bert --test eval --torchdynamo inductor` Here is the code...

snadampal

Add torch.jit.trace option

1

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #2255 python run.py resnet50 -d cuda -t train --backend torchscript_trace Differential Revision: [D56849015](https://our.internmc.facebook.com/intern/diff/D56849015)

tugsbayasgalan

cla signed

upload pt2 cprofile stats to manifold

Summary: https://fb.workplace.com/groups/257735836456307/permalink/657458576484029/ upload cprofile to manifold D56696397 has a script to convert profiler stats to dot graphs (see its test plan) Differential Revision: D56679561

dshi7

cla signed

Add launch_latency to the CI

1

Test Plan: ``` $ python run_benchmark.py triton --ci { "name": "triton", "environ": { "pytorch_git_version": "734a000f16b60b3a4e18404e5047a467e2bc96d4", "pytorch_version": "2.4.0.dev20240425+cu121", "triton_version": "3.0.0+45fff310c8" }, "metrics": { "tritonbench_softmax[x_256-naive_softmax-gbps]": 186.1818167276619, "tritonbench_softmax[x_256-naive_softmax-latency]": 0.04505600035190582, "tritonbench_softmax[x_256-triton_softmax-gbps]": 546.1333471298221, "tritonbench_softmax[x_256-triton_softmax-latency]": 0.015359999611973763,...

xuzhao9

cla signed

RuntimeError When Enabling Accuracy Checks in yolov3 Training on GPU.

Issue Description I encounter a RuntimeError related to gradient computation when enabling accuracy checks during the training of yolov3 in a GPU docker environment. The training runs without issues when...

scshtyk

RuntimeError When Enabling Accuracy Checks in DALLE2_pytorch Training on GPU.

2

**Issue Description** I encounter a RuntimeError related to gradient computation when enabling accuracy checks during the training of DALLE2_pytorch in a GPU docker environment. The training runs without issues when...

cjxjxjx

benchmark
benchmark copied to clipboard

Metadata

TFLOPS calculation for TF32

bug for DCGM fp32 metric collection

Move the reference model (for train correctness checks) to cpu before starting tests

CVE-2007-4559 Patch

torchbench: torch.no_grad() not working for dynamo inductor backend

Add torch.jit.trace option

upload pt2 cprofile stats to manifold

Add launch_latency to the CI

RuntimeError When Enabling Accuracy Checks in yolov3 Training on GPU.

RuntimeError When Enabling Accuracy Checks in DALLE2_pytorch Training on GPU.

← Metadata

Owner

Metadata

benchmark benchmark copied to clipboard

Metadata

← Metadata

Owner

Metadata

benchmark
benchmark copied to clipboard