NeMo icon indicating copy to clipboard operation
NeMo copied to clipboard

[ASR] Adding Speech separation models to NeMo

Open krishnacpuvvada opened this issue 3 years ago • 4 comments

What does this PR do ?

Adds speech separation model to NeMo.

Collection: ASR

Changelog

  • High level script for speech separation examples/speech_tasks/separation/speech_separation.py
  • Sample config file examples/speech_tasks/separation/conf/sep_transformer.yaml
  • Separation model nemo/collections/asr/models/ss_model.py
  • Pre-processor AudioToFeaturesConvPreprocessor in nemo/collections/asr/modules/audio_preprocessing.py
  • Decoder nemo/collections/asr/modules/ss_decoder.py
  • Loss nemo/collections/asr/losses/ss_losses/si_snr.py
  • Dataset nemo/collections/asr/data/audio_to_audio.py
  • Inference EncDecSpeechSeparationModel.extract_sources() in nemo/collections/asr/models/ss_model.py

Usage

# load model
cfg = {'init_from_nemo_model': '<path_to_nemo_ckpt>'}
cfg = OmegaConf.create(cfg)
model_cfg = ASRModel.restore_from(restore_path=cfg.init_from_nemo_model, return_config=True)
sep_model = EncDecSpeechSeparationModel(cfg=model_cfg)
sep_model.maybe_init_from_pretrained_checkpoint(cfg, map_location='cpu')

# run inference
paths2audio_files=[<list_of_audio_mixtures_to_separate>]
sep_model.extract_sources(
    paths2audio_files = paths2audio_files,
    save_dir='<path_to_save_separated_sources>',
    orig_sr=16000,
    num_sources=2,
    batch_size=1,
    num_workers=1,
)

Before your PR is "Ready for review"

Pre checks:

  • [x] Make sure you read and followed Contributor guidelines
  • [ ] Did you write any new necessary tests?
  • [x] Did you add or update any necessary documentation?
  • [ ] Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • [ ] Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • [x] New Feature
  • [ ] Bugfix
  • [ ] Documentation

krishnacpuvvada avatar Jul 25 '22 23:07 krishnacpuvvada

This pull request introduces 60 alerts when merging f848b7cef8345ff785cff535c54b7bed8b10e583 into c324499a46587ebabeddc3c21d4da514820bf4c8 - view on LGTM.com

new alerts:

  • 54 for Unused import
  • 5 for Explicit export is not defined
  • 1 for 'import *' may pollute namespace

lgtm-com[bot] avatar Jul 25 '22 23:07 lgtm-com[bot]

This pull request introduces 60 alerts when merging d4b633318efb23d4283ad9563ad305685ae64f57 into c324499a46587ebabeddc3c21d4da514820bf4c8 - view on LGTM.com

new alerts:

  • 54 for Unused import
  • 5 for Explicit export is not defined
  • 1 for 'import *' may pollute namespace

lgtm-com[bot] avatar Jul 26 '22 00:07 lgtm-com[bot]

This pull request introduces 60 alerts when merging ec937f43a2d6007852543f7fb58b80414c47b23b into 7890979f622136cf1ba7ff558fb11a092c77700e - view on LGTM.com

new alerts:

  • 54 for Unused import
  • 5 for Explicit export is not defined
  • 1 for 'import *' may pollute namespace

lgtm-com[bot] avatar Jul 26 '22 17:07 lgtm-com[bot]

This pull request introduces 60 alerts when merging fb36031bd1f4ee32fae9d9b4bd8c5b09f36694fe into 7890979f622136cf1ba7ff558fb11a092c77700e - view on LGTM.com

new alerts:

  • 54 for Unused import
  • 5 for Explicit export is not defined
  • 1 for 'import *' may pollute namespace

lgtm-com[bot] avatar Jul 26 '22 17:07 lgtm-com[bot]

This PR is stale because it has been open for 30 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

github-actions[bot] avatar Oct 06 '22 02:10 github-actions[bot]