TransformerEngine
TransformerEngine copied to clipboard
[C/PyTorch] Refactor and move userbuffers into TE/common
This PR moves all the userbuffers code in TE/pytorch to TE/common and refactors the interfaces to make TE/common/userbuffers accessible to all framework integrations.
To do:
- [x] Move userbuffers from TE/pytorch to TE/common.
- [x] Bootstrap userbuffers with PyTorch collectives.
- [x] Update build logic with CXX ABI version fix and correct rpaths.
- [x] Implement comm overlap example for PyTorch.
- [x] Verify
split_overlap_ag_p2p - [x] Verify
split_overlap_rs_p2p - [ ] Verify
split_overlap_rs - [ ] Verify
atomic_gemm_overlap_ag_p2p - [ ] Verify
atomic_gemm_overlap_rs_p2p - [ ] Verify
atomic_gemm_overlap_rs - [ ] Verify
bulk_overlapfor AG - [ ] Verify
bulk_overlapfor RS - [ ] Implement unit tests.
@timmoon10 FYI I will be removing the 3rd party dlpack package I introduced earlier in this PR. It's not needed for the PyTorch collective callbacks, and I can bring it back if it becomes necessary for JAX down the line (but I'd like to avoid it if I can).
This work has been moved to a new branch due to too many conflicts with TE/main. Closing the PR and filing a new one.