Triton pin update for PyTorch 2.8 / Triton 3.4
Creating an issue to track tasks related to possible triton commit bump for PyTorch release/2.8.
Creating this to assess the possibility of bumping the triton pin for upcoming pytorch/2.8 release. Going to link any issues related to this here.
Please list any associated issues and PRs in the comments for tracking purposes.
Details on failures can be accessed from workflows of the testing PR: https://github.com/pytorch/pytorch/pull/155117 Currently testing triton commit: https://github.com/triton-lang/triton/commit/65648400d91f074770224ea38b732d7cba934f12
1146 total failing tests:
High priority general issues
- [ ] 1007 https://github.com/pytorch/pytorch/issues/154162 (@davidberard98)
- [x] 1+ https://github.com/pytorch/pytorch/issues/154933 (@davidberard98)
- [x] 1+ https://github.com/pytorch/pytorch/issues/154938 (@davidberard98)
- [ ] https://github.com/pytorch/pytorch/issues/155047
- [x] 1 https://github.com/pytorch/pytorch/issues/154223 -> https://github.com/pytorch/pytorch/pull/154894 (@davidberard98)
- [x] https://github.com/triton-lang/triton/pull/7031 (@atalman)
- [x] https://github.com/pytorch/pytorch/pull/154153 (@atalman)
- [x] https://github.com/pytorch/pytorch/issues/154157 (@atalman)
- [x] https://github.com/pytorch/pytorch/pull/155373 (@NikhilAPatel)]
- [ ] https://github.com/pytorch/pytorch/issues/155574 (@davidberard98)
- [ ] https://github.com/pytorch/pytorch/issues/155584
- [ ] https://github.com/pytorch/pytorch/issues/156028
General issues
- [x] 8 https://github.com/pytorch/pytorch/issues/154250 (@davidberard98)
AMD specific
- [x] 5 https://github.com/pytorch/pytorch/issues/155803
- [x] 3 https://github.com/pytorch/pytorch/issues/154215
- [x] 1 https://github.com/pytorch/pytorch/issues/154224 (can't repro on h100)
Follow-ups
- [ ] https://github.com/pytorch/pytorch/issues/155856
cc @chauhang @penguinwu @bertmaher @int3 @davidberard98 @nmacchioni @chenyang78 @embg @peterbell10 @aakhundov
Would really appreciate if someone could take a look at why cuda builds fail for testing PR. There're some folder structure changes in triton so install_triton script fails for some reason for cuda, but not for amd. I would like to get a full CI run and do a cross-comparison.
@iupaikov-amd can you please create a PR that updates the pin and see what will fail rather than creating numerous issues, because it feels like some of them might be a local setup problems
@davidberard98
re: cuda build failures, I'm putting up https://github.com/pytorch/pytorch/pull/154635. I'm not sure right now whether this fixes all the issues or just some of them.
@anmyachev's https://github.com/pytorch/pytorch/pull/154905 fixes some recent _unwrap_if_constexpr failures
Updated triton commit hash to 65648400d91f074770224ea38b732d7cba934f12 to try and fix cuda build issues in testing PR.
Closed all irrelevant issues that were not caused by triton pin update or are handled internally. Refreshed testing PR and got new results from cuda as well, going to create a list of issues, compare them between cuda and rocm. After that add a couple tasks here if they are required.
Edit: even with build scripts updated cuda images still fail to build. Sample run of fresh main branch: https://github.com/pytorch/pytorch/actions/runs/15442199788/job/43462785692
Need to confirm if https://github.com/pytorch/pytorch/issues/155803 is fixed by https://github.com/triton-lang/triton/pull/7163 and if so, we are good to go with cutoff. CI run will be ready on Friday I think.
Everything looks green on AMD side with the latest CI run in the pinned testing PR. We are ready for the cutoff, Triton team merged all the relevant PRs.
@davidberard98 List of remaining cuda failures:
test_compiled_flex_attention_full_model_ddp - massive exception message from compiler subprocess
test_compiled_flex_attention_local_ddp - same as above
test_while_loop_with_mixed_device_dynamic_False_cpu_with_stack_allocation - unexpected success, seems not important
test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_bfloat16 - assertion on test_foreach.py line 94
Feel free to reach me in slack if you need full error messages.
Closing, as pin updates has been done