add large model stable training fix and contrastive loss update for variable seq

Open nithinraok opened this issue 9 months ago • 0 comments

What does this PR do ?

This PR brings two updates:

Fix SSL contrastive loss to support loss computation for variable input length rather than computation based on subsampling factor
Add an option to remove bias from Linear and Conv layers in Conformer layers to support scaling for multi billion parameter training [tested currently for 1B parameter using bf16]

Collection: ASR

Changelog

Updated ssl_loss
Added argument for conformerlayer to optionally remove bias, default is False

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR. To re-run CI remove and add the label again. To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

[ ] Make sure you read and followed Contributor guidelines
[ ] Did you write any new necessary tests?
[ ] Did you add or update any necessary documentation?
[ ] Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- [ ] Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

[x] New Feature
[x] Bugfix
[ ] Documentation

May 20 '24 18:05 nithinraok

NeMo NeMo copied to clipboard

add large model stable training fix and contrastive loss update for variable seq

What does this PR do ?

Changelog

GitHub Actions CI

Before your PR is "Ready for review"

NeMo
NeMo copied to clipboard