NeMo icon indicating copy to clipboard operation
NeMo copied to clipboard

add large model stable training fix and contrastive loss update for variable seq

Open nithinraok opened this issue 9 months ago • 0 comments

What does this PR do ?

This PR brings two updates:

  • Fix SSL contrastive loss to support loss computation for variable input length rather than computation based on subsampling factor
  • Add an option to remove bias from Linear and Conv layers in Conformer layers to support scaling for multi billion parameter training [tested currently for 1B parameter using bf16]

Collection: ASR

Changelog

  • Updated ssl_loss
  • Added argument for conformerlayer to optionally remove bias, default is False

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR. To re-run CI remove and add the label again. To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

  • [ ] Make sure you read and followed Contributor guidelines
  • [ ] Did you write any new necessary tests?
  • [ ] Did you add or update any necessary documentation?
  • [ ] Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • [ ] Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • [x] New Feature
  • [x] Bugfix
  • [ ] Documentation

nithinraok avatar May 20 '24 18:05 nithinraok