functorch
functorch copied to clipboard
Possible (-2 to 4%) regression in functorch_dp_cifar10_cuda model from 0.1.1 to latest
To repro:
# setup pytorch/benchmark
git clone https://github.com/pytorch/benchmark
cd benchmark
# this doesn't need to complete successfully -- we just need to install torchbenchmark's basic dependencies.
python setup.py install
python run_benchmark.py functorch
On my machine with a ~v100~ P100 GPU, the runtime gos from 72ms to 83ms
On A100s, seeing 48ms to 50ms, ~4% regression
On AWS V100s, I'm seeing 53ms on 0.1.1 50ms on 0.2.1 52ms on 1.13
~4% regression from 0.2.1
I redid the experiment with actual V100s on the FAIR cluster, numbers are 75ms (0.1.1) -> 74 ms (0.2.1) 72ms (1.13) which is not a regression.
On that note I'm curious why the V100's on different systems have different performance -- maybe a difference in the CPUs? (Or CUDA version? My experiments were done with the pytorch cuda 10.2 binaries)