NeMo
NeMo copied to clipboard
add large model stable training fix and contrastive loss update for variable seq
What does this PR do ?
This PR brings two updates:
- Fix SSL contrastive loss to support loss computation for variable input length rather than computation based on subsampling factor
- Add an option to remove bias from Linear and Conv layers in Conformer layers to support scaling for multi billion parameter training [tested currently for 1B parameter using bf16]
Collection: ASR
Changelog
- Updated ssl_loss
- Added argument for conformerlayer to optionally remove bias, default is False
GitHub Actions CI
The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.
The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR. To re-run CI remove and add the label again. To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".
Before your PR is "Ready for review"
Pre checks:
- [ ] Make sure you read and followed Contributor guidelines
- [ ] Did you write any new necessary tests?
- [ ] Did you add or update any necessary documentation?
- [ ] Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- [ ] Reviewer: Does the PR have correct import guards for all optional libraries?
PR Type:
- [x] New Feature
- [x] Bugfix
- [ ] Documentation