functorch icon indicating copy to clipboard operation
functorch copied to clipboard

Possible (-2 to 4%) regression in functorch_dp_cifar10_cuda model from 0.1.1 to latest

Open zou3519 opened this issue 3 years ago • 3 comments

To repro:

# setup pytorch/benchmark
git clone https://github.com/pytorch/benchmark
cd benchmark
# this doesn't need to complete successfully -- we just need to install torchbenchmark's basic dependencies.
python setup.py install

python run_benchmark.py functorch

On my machine with a ~v100~ P100 GPU, the runtime gos from 72ms to 83ms

zou3519 avatar Oct 10 '22 14:10 zou3519

On A100s, seeing 48ms to 50ms, ~4% regression

samdow avatar Oct 10 '22 18:10 samdow

On AWS V100s, I'm seeing 53ms on 0.1.1 50ms on 0.2.1 52ms on 1.13

~4% regression from 0.2.1

samdow avatar Oct 10 '22 19:10 samdow

I redid the experiment with actual V100s on the FAIR cluster, numbers are 75ms (0.1.1) -> 74 ms (0.2.1) 72ms (1.13) which is not a regression.

On that note I'm curious why the V100's on different systems have different performance -- maybe a difference in the CPUs? (Or CUDA version? My experiments were done with the pytorch cuda 10.2 binaries)

zou3519 avatar Oct 10 '22 21:10 zou3519