Taejin Park comments

Results 36 comments of


                                            Taejin Park

ZeroDivisionError when diarizing a short file

This error seems like happening since the audio is containing 0 second of RTTM target evaluation duration. We cannot reproduce the error unless we have access to the audio input....

ZeroDivisionError when diarizing a short file

Thanks for sharing the samples. We have plenty of issue traffic so it will take some time to try the samples and fix it, but we will definitely try this...

Diarization | IndexError: shape mismatch: indexing tensors could not be broadcast together

Hi. Let us test on the wav file you provided. This is a new type of error we have never encountered. It apprears `p_value` value in [this line ](https://github.com/NVIDIA/NeMo/blob/0fb851c8ed91f181c5550dcdce1e39f5891ec3b2/nemo/collections/asr/parts/utils/offline_clustering.py#L332) is...

OOM killed when diarizing 4 hours recording

@tttalshaul Sorry for the late reply. If you have visible cuda devices, both embedding extractions and clustering will be happening on GPU RAM. ``` clustering: parameters: .... chunk_cluster_count: 50 #...

OOM killed when diarizing 4 hours recording

> gpustat showed 0 GPU utilization after embedding, there was a small step of get_argmin_mat and then OOM.. (There wasn't enough time to see more steps). First check `torch.cuda.is_available()` in...

OOM killed when diarizing 4 hours recording

@tttalshaul Hi. Now I can see that the CPU-RAM that is piling up is speaker embedding vectors for clustering. That speaker embedding vectors cannot be removed since it needs to...

OOM killed when diarizing 4 hours recording

@dkurt There are two types of OOM issues: `CPU-RAM OOM`: After extracting speaker embeddings with GPU, it needs to load off the vectors to CPU-RAM, cause GPU-RAM is heavily limited....

OOM killed when diarizing 4 hours recording

@khaykingleb The MSDD-v2 model for Chime8 baseline will give you a different DER number for the same audio file, since these chime8 models are tuned for chime7 train/dev datasets.

Speaker Diarization - Extracting speaker embeddings for labels from files with multiple speakers from Cluster Diarizer models

Hi @sunraymoonbeam @maxpain Sorry for missing this question and not answering this in timely manner. Multi-scale diarization decoder (MSDD) model only supports `diar_infer_telephonic.yaml`. So you cannot use `diar_infer_meeting.yaml` settings. -...

OOM killed when diarizing 4 hours recording

@nithinraok I was not able to get the 4hour long audio samples yet. Once I get access to it, I will work on this PR.