He Huang (Steve)

Results 36 comments of He Huang (Steve)

> 1. How can I ensure that every session has exactly as many speakers as num_speakers? We need a little more time to figure out why `enforce_num_speakers: true` is not...

Could you please try setting `drop_last=true` in the dataset config and see if that helps?

In that case, could you please add the following flags and run the program again? There should be more information on the actual cause of hanging: ```bash export NCCL_DEBUG=INFO export...

> @stevehuang52 looks like the CI complains on not-related linting issues in the same file. What should be done in this case? No worries, I got this

Sorry I'm on leave but maybe @pzelasko can help taking a look?

Hi @AudranBert, thanks for posting the issue. You're right that the MultiModalConversation adapter doesn't actually take audio offset and instead just load the whole audio (https://github.com/NVIDIA-NeMo/NeMo/blob/main/nemo/collections/common/data/lhotse/text_adapters.py#L632 and https://github.com/NVIDIA-NeMo/NeMo/blob/main/nemo/collections/common/data/lhotse/text_adapters.py#L570). @pzelasko could...

@AudranBert the dataloader will be more efficient if you use tarred datasets, and you can refer to this script https://github.com/NVIDIA-NeMo/NeMo/blob/main/scripts/speech_llm/export_conversations_to_tar.py to get the tarred data from multimodal conversation manifests.