rocm_sdk_builder
rocm_sdk_builder copied to clipboard
pytorch cpu very slow
During evaluation of #224 rdna4/gfx12 support, it seems python pytorch is very slow. At least as measured by the bundled pytorch_cpu_vs_gpu_simple_benchmark.sh benchmark.
The test matrix size is only 200x200, so to avoid any issue with setup overhead skewing numbers I also tested with a larger 2000x2000 matrix.
| test | rocm_sdk_builder 633 WIP | pytorch nightly whl |
|---|---|---|
| CPU - 200x200 | 14.94 sec | 0.076 sec |
| GPU - 200x200 | FAIL | 0.680 sec |
| CPU - 2000x2000 | 160.12 sec | 12.11 sec |
| GPU - 2000x2000 | FAIL | 0.238 sec |