NeMo
NeMo copied to clipboard
Context Parallel SFT Support for dataset in THD format
What does this PR do ?
This PR adds CP support for THD format and is compatible with cu_seqlen_padded in the latest CUDNN fused attention.
PR Type:
- [x] New Feature
- [ ] Bugfix
- [ ] Documentation
If you haven't finished some of the above items you can still open "Draft" PR.