NeMo
NeMo copied to clipboard
Add SpeechLM to main
What does this PR do ?
Add SpeechLLM training/inference scripts to NeMo, along with dataset, model, examples and test.
Main features
- Model class for SALM-style architecture, which supports SFT&PEFT.
- Some auxiliary modules to support multi-layer feature extraction, and multiple audio encoders.
- Dataset class for audio-text-question-answering tasks (generalized for any audio-to-text tasks)
- Detailed examples on training and evaluating SpeechLLMs
- Minor updates to Megatron code to work with SpeechLLM, removing some hard assumptions (e.g., assert, strict=True). Minor updates to data utils that move dict data to cuda and split into micro batches.
Collection: [common,nlp,multimodal]
PR Type:
- [x] New Feature
- [ ] Bugfix
- [ ] Documentation
@titu1994 @nithinraok could you please take another look to see if your comments have been addressed? Thanks~
Steve, can you look at CodeQL comments.
@titu1994 @zhehuaichen I've refactored the dataset such that the input and output keys can be configured dynamically by setting context_key
and answer_key
in the dataset. For example, if we want to use input_text
and output_text
as the text input and output keys in manifest, we can set context_key='input_text'
and answer_key='output_text'
. The defaults are context
and answer
, and I also added a backward compatibility check for the old question
field.
@zhehuaichen FYI I removed the random context training
trick from the dataset, since it only makes sense for word-boosting and not other tasks. It's better to actually generate those word-boosting manifests instead of doing the trick which may hurt other tasks.
@zhehuaichen FYI I removed the
random context training
trick from the dataset, since it only makes sense for word-boosting and not other tasks. It's better to actually generate those word-boosting manifests instead of doing the trick which may hurt other tasks.
sg for removing but is it possible to still include that part of the training in the release ckpt?
@zhehuaichen FYI I removed the
random context training
trick from the dataset, since it only makes sense for word-boosting and not other tasks. It's better to actually generate those word-boosting manifests instead of doing the trick which may hurt other tasks.sg for removing but is it possible to still include that part of the training in the release ckpt?
@zhehuaichen Yes the checkpoint trained for release has that included
jenkins
@titu1994 how to invoke CI tests? I tried jenkins
but it didn't seem to work...
@aklife97 Could you please review the changes to the NLP collection? There're mainly two changes:
- Modifying
get_iterator_k_split
to support splitting non-tensor objects (e.g., lists), while the behavior is same as before if the batch only has tensor objects. - Changing hard
assert
to raising warnings when loading adapters that have different params than the actual adapters in LLM. This is needed since we store the ASR encoders in the same checkpoint as the GPT adapters, where it leads to additional params when loading the adapter checkpoint for the GPT adapter.
Please let me know if you have any questions, thanks~!
Hi @ericharper, could you please help reviewing (or assign someone else available to review) the small changes to the NLP collection? There're mainly two changes:
- Modifying
get_iterator_k_split
to support splitting non-tensor objects (e.g., lists), while the behavior is same as before if the batch only has tensor objects. - Changing hard
assert
to raising warnings when loading adapters that have different params from the actual adapters in LLM. This is needed since we store the ASR encoders in the same checkpoint as the GPT adapters, where it leads to additional params when loading the adapter checkpoint for the GPT adapter.
Please let me know if you have any questions, thanks~!
@arendu suggested by Abhinav, could you please help review the small changes to the NLP collection? There're mainly two changes:
- Modifying get_iterator_k_split to support splitting non-tensor objects (e.g., lists), while the behavior is same as before if the batch only has tensor objects.
- Changing hard assert to raising warnings when loading adapters that have different params from the actual adapters in LLM. This is needed since we store the ASR encoders in the same checkpoint as the GPT adapters, where it leads to additional params when loading the adapter checkpoint for the GPT adapter.
Please let us know if you have any questions, thanks~!