pytorch icon indicating copy to clipboard operation
pytorch copied to clipboard

Triton pin update for PyTorch 2.8 / Triton 3.4

Open iupaikov-amd opened this issue 7 months ago • 9 comments

Creating an issue to track tasks related to possible triton commit bump for PyTorch release/2.8.

Creating this to assess the possibility of bumping the triton pin for upcoming pytorch/2.8 release. Going to link any issues related to this here.

Please list any associated issues and PRs in the comments for tracking purposes.

Details on failures can be accessed from workflows of the testing PR: https://github.com/pytorch/pytorch/pull/155117 Currently testing triton commit: https://github.com/triton-lang/triton/commit/65648400d91f074770224ea38b732d7cba934f12

1146 total failing tests:

High priority general issues

  • [ ] 1007 https://github.com/pytorch/pytorch/issues/154162 (@davidberard98)
  • [x] 1+ https://github.com/pytorch/pytorch/issues/154933 (@davidberard98)
  • [x] 1+ https://github.com/pytorch/pytorch/issues/154938 (@davidberard98)
  • [ ] https://github.com/pytorch/pytorch/issues/155047
  • [x] 1 https://github.com/pytorch/pytorch/issues/154223 -> https://github.com/pytorch/pytorch/pull/154894 (@davidberard98)
  • [x] https://github.com/triton-lang/triton/pull/7031 (@atalman)
  • [x] https://github.com/pytorch/pytorch/pull/154153 (@atalman)
  • [x] https://github.com/pytorch/pytorch/issues/154157 (@atalman)
  • [x] https://github.com/pytorch/pytorch/pull/155373 (@NikhilAPatel)]
  • [ ] https://github.com/pytorch/pytorch/issues/155574 (@davidberard98)
  • [ ] https://github.com/pytorch/pytorch/issues/155584
  • [ ] https://github.com/pytorch/pytorch/issues/156028

General issues

  • [x] 8 https://github.com/pytorch/pytorch/issues/154250 (@davidberard98)

AMD specific

  • [x] 5 https://github.com/pytorch/pytorch/issues/155803
  • [x] 3 https://github.com/pytorch/pytorch/issues/154215
  • [x] 1 https://github.com/pytorch/pytorch/issues/154224 (can't repro on h100)

Follow-ups

  • [ ] https://github.com/pytorch/pytorch/issues/155856

cc @chauhang @penguinwu @bertmaher @int3 @davidberard98 @nmacchioni @chenyang78 @embg @peterbell10 @aakhundov

iupaikov-amd avatar May 23 '25 13:05 iupaikov-amd

Would really appreciate if someone could take a look at why cuda builds fail for testing PR. There're some folder structure changes in triton so install_triton script fails for some reason for cuda, but not for amd. I would like to get a full CI run and do a cross-comparison.

iupaikov-amd avatar May 23 '25 14:05 iupaikov-amd

@iupaikov-amd can you please create a PR that updates the pin and see what will fail rather than creating numerous issues, because it feels like some of them might be a local setup problems

malfet avatar May 23 '25 16:05 malfet

@davidberard98

desertfire avatar May 25 '25 13:05 desertfire

re: cuda build failures, I'm putting up https://github.com/pytorch/pytorch/pull/154635. I'm not sure right now whether this fixes all the issues or just some of them.

davidberard98 avatar May 29 '25 17:05 davidberard98

@anmyachev's https://github.com/pytorch/pytorch/pull/154905 fixes some recent _unwrap_if_constexpr failures

davidberard98 avatar Jun 02 '25 23:06 davidberard98

Updated triton commit hash to 65648400d91f074770224ea38b732d7cba934f12 to try and fix cuda build issues in testing PR.

iupaikov-amd avatar Jun 04 '25 09:06 iupaikov-amd

Closed all irrelevant issues that were not caused by triton pin update or are handled internally. Refreshed testing PR and got new results from cuda as well, going to create a list of issues, compare them between cuda and rocm. After that add a couple tasks here if they are required.

Edit: even with build scripts updated cuda images still fail to build. Sample run of fresh main branch: https://github.com/pytorch/pytorch/actions/runs/15442199788/job/43462785692

iupaikov-amd avatar Jun 05 '25 12:06 iupaikov-amd

Need to confirm if https://github.com/pytorch/pytorch/issues/155803 is fixed by https://github.com/triton-lang/triton/pull/7163 and if so, we are good to go with cutoff. CI run will be ready on Friday I think.

iupaikov-amd avatar Jun 12 '25 16:06 iupaikov-amd

Everything looks green on AMD side with the latest CI run in the pinned testing PR. We are ready for the cutoff, Triton team merged all the relevant PRs.

@davidberard98 List of remaining cuda failures:

test_compiled_flex_attention_full_model_ddp - massive exception message from compiler subprocess
test_compiled_flex_attention_local_ddp - same as above
test_while_loop_with_mixed_device_dynamic_False_cpu_with_stack_allocation - unexpected success, seems not important
test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_bfloat16 - assertion on test_foreach.py line 94

Feel free to reach me in slack if you need full error messages.

iupaikov-amd avatar Jun 13 '25 09:06 iupaikov-amd

Closing, as pin updates has been done

malfet avatar Jul 09 '25 16:07 malfet