pytorch icon indicating copy to clipboard operation
pytorch copied to clipboard

[release/2.5][ROCm][TunableOp] Improve identification of fastest solution (#144942)

Open naromero77amd opened this issue 8 months ago • 2 comments

This PR addresses some stability issues with identifying the fastest solution on AMD GPUs, particularly the MI300.

Changes include:

  • An improved timer, StreamTimerNoSync
  • More aggressive skipping of slow solutions
  • Additional statistics that can be used for diagnostics PYTORCH_TUNABLEOP_VERBOSE=3

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144942 Approved by: https://github.com/jeffdaily

(cherry picked from commit fd0cd6a08f706b7bb1dedb296217b6441e4fb9ff)

naromero77amd avatar Apr 04 '25 00:04 naromero77amd

This is a performance improvement from upstream. So far, there have been no negative reports w.r.t. to performance. So, I think it's worth backporting. I will also add it to ROCm release/2.6. It cannot be trivially backported to release/2.4.

naromero77amd avatar Apr 04 '25 00:04 naromero77amd

Jenkins build for acd66a22a6f79aa784015121cc22fa653ac1e9bb commit finished as FAILURE Links: Blue Ocean view / Build artifacts

Jenkins build for acd66a22a6f79aa784015121cc22fa653ac1e9bb commit finished as FAILURE Links: Blue Ocean view / Build artifacts

Jenkins build for acd66a22a6f79aa784015121cc22fa653ac1e9bb commit finished as FAILURE Links: Blue Ocean view / Build artifacts

Jenkins build for acd66a22a6f79aa784015121cc22fa653ac1e9bb commit finished as FAILURE Links: Blue Ocean view / Build artifacts

Jenkins build for acd66a22a6f79aa784015121cc22fa653ac1e9bb commit is in progress Links: Blue Ocean view / Build artifacts

!cherry-pick --onto release/2.6

naromero77amd avatar Apr 21 '25 22:04 naromero77amd

Created branch autogenerated/release/2.6_cherry-pick_pr-2018 and https://github.com/ROCm/pytorch/pull/2041

rocm-mici avatar Apr 21 '25 23:04 rocm-mici