NeMo icon indicating copy to clipboard operation
NeMo copied to clipboard

Mutiscale Diarization Decoder (MSDD) model and module files

Open tango4j opened this issue 3 years ago • 1 comments

Signed-off-by: Taejin Park [email protected]

What does this PR do ?

Add multiscale diarization decoder model module, and yaml files. Additionally, yaml files are also uploaded.

  • msdd_models.py

Three classes are included: (1) ClusterEmbedding This class is for taking care of cluster-average embeddings by launching clustering diarizer class. The methods in this class read embedding vectors and clustering output that are written from clustering diarizer class.

(2) EncDecDiarLabelModel MSDD model class.

(3) OverlapAwareDiarizer A class that launches Clustering Diarizer and run MSDD infer then evaluate it. It involves integration of pairwise inference from MSDD and split-infer that breaks down the input to short sequences.

  • msdd_diarizer.py MSDD Neural model class.

  • speaker_utils.py audio_rttm_map() function is modified to deal with the uniq_id with duration information.

-audio_to_diar_label.py, collections.py

These are couple of very small points where it needs to be changed like: rttm_files -> audio_files (We need to extract uniq_id when rtttm is not provided for evaluation)

  • 4 x yaml files multiscale_diar_dec_infer.domain.yaml -> for inference multiscale_diar_dec.domain.yaml -> for training

The name of these files are up for debate.

  • Tutorial and script for training data generation is excluded since it makes PR too big.

Collection: [Note which collection this PR will affect] ASR

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

  • Use the following command to run MSDD.
HYDRA_FULL_ERROR=1 python $BASEPATH/msdd_decoding.py --config-path='conf' --config-name='diarization_decoder.telephonic.yaml' \
    diarizer.manifest_filepath=<TEST_MANIFEST> \
    diarizer.out_dir=<TEST_EMB_DIR> \
    diarizer.msdd_model.model_path=<MSDD_MODEL_PATH>\

Before your PR is "Ready for review"

Pre checks:

  • [x] Make sure you read and followed Contributor guidelines
  • [ ] Did you write any new necessary tests?
  • [ ] Did you add or update any necessary documentation?
  • [ ] Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • [ ] Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • [x] New Feature
  • [ ] Bugfix
  • [ ] Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed. Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

tango4j avatar Aug 02 '22 02:08 tango4j

This pull request introduces 1 alert when merging 8cfb26d2fc2636ffc3dd774067c32284a1418c7e into f921ebe0436e55f7547b183ca83a623f6678422d - view on LGTM.com

new alerts:

  • 1 for Unused local variable

lgtm-com[bot] avatar Aug 09 '22 23:08 lgtm-com[bot]

This pull request introduces 3 alerts when merging a9127a08918cbbee062ad279f3449ac4ba9c8122 into f8ca550967a83473aa2c20267690ac59c4fb640f - view on LGTM.com

new alerts:

  • 2 for Unused import
  • 1 for Wrong name for an argument in a call

lgtm-com[bot] avatar Aug 15 '22 08:08 lgtm-com[bot]

This pull request introduces 4 alerts when merging 26fcd9d4db873388e752144533b8680aedd85876 into 193a0c3216f14ce6e9ca6dd4934b14788a63cb64 - view on LGTM.com

new alerts:

  • 3 for Unused import
  • 1 for Redundant assignment

lgtm-com[bot] avatar Aug 16 '22 03:08 lgtm-com[bot]