He Huang (Steve) comments

Results 36 comments of


                                            He Huang (Steve)

Question about the settings in speech_data_simulator

> 1. How can I ensure that every session has exactly as many speakers as num_speakers? We need a little more time to figure out why `enforce_num_speakers: true` is not...

Unable to Train Wav2vec2 on Multiple GPUs

Could you please try setting `drop_last=true` in the dataset config and see if that helps?

Unable to Train Wav2vec2 on Multiple GPUs

In that case, could you please add the following flags and run the program again? There should be more information on the actual cause of hanging: ```bash export NCCL_DEBUG=INFO export...

Modernize logger interface

> @stevehuang52 looks like the CI complains on not-related linting issues in the same file. What should be done in this case? No worries, I got this

Remove adapter_path from base AutoResume and refactor PEFT checkpoint handling

@cuichenx Could you please review?

Speechlm2 NaN loss

Sorry I'm on leave but maybe @pzelasko can help taking a look?

SpeechLM not using "offset" key

Hi @AudranBert, thanks for posting the issue. You're right that the MultiModalConversation adapter doesn't actually take audio offset and instead just load the whole audio (https://github.com/NVIDIA-NeMo/NeMo/blob/main/nemo/collections/common/data/lhotse/text_adapters.py#L632 and https://github.com/NVIDIA-NeMo/NeMo/blob/main/nemo/collections/common/data/lhotse/text_adapters.py#L570). @pzelasko could...

SpeechLM not using "offset" key

@AudranBert the dataloader will be more efficient if you use tarred datasets, and you can refer to this script https://github.com/NVIDIA-NeMo/NeMo/blob/main/scripts/speech_llm/export_conversations_to_tar.py to get the tarred data from multimodal conversation manifests.

He Huang (Steve)

fix canary chunk infer bug

fix canary chunk infer bug

Question about the settings in speech_data_simulator

Unable to Train Wav2vec2 on Multiple GPUs

Unable to Train Wav2vec2 on Multiple GPUs

Modernize logger interface

Remove adapter_path from base AutoResume and refactor PEFT checkpoint handling

Speechlm2 NaN loss

SpeechLM not using "offset" key

SpeechLM not using "offset" key