Taejin Park

Results 36 comments of Taejin Park

This error seems like happening since the audio is containing 0 second of RTTM target evaluation duration. We cannot reproduce the error unless we have access to the audio input....

Thanks for sharing the samples. We have plenty of issue traffic so it will take some time to try the samples and fix it, but we will definitely try this...

Hi. Let us test on the wav file you provided. This is a new type of error we have never encountered. It apprears `p_value` value in [this line ](https://github.com/NVIDIA/NeMo/blob/0fb851c8ed91f181c5550dcdce1e39f5891ec3b2/nemo/collections/asr/parts/utils/offline_clustering.py#L332) is...

@tttalshaul Sorry for the late reply. If you have visible cuda devices, both embedding extractions and clustering will be happening on GPU RAM. ``` clustering: parameters: .... chunk_cluster_count: 50 #...

> gpustat showed 0 GPU utilization after embedding, there was a small step of get_argmin_mat and then OOM.. (There wasn't enough time to see more steps). First check `torch.cuda.is_available()` in...

@tttalshaul Hi. Now I can see that the CPU-RAM that is piling up is speaker embedding vectors for clustering. That speaker embedding vectors cannot be removed since it needs to...

@dkurt There are two types of OOM issues: `CPU-RAM OOM`: After extracting speaker embeddings with GPU, it needs to load off the vectors to CPU-RAM, cause GPU-RAM is heavily limited....

@khaykingleb The MSDD-v2 model for Chime8 baseline will give you a different DER number for the same audio file, since these chime8 models are tuned for chime7 train/dev datasets.

Hi @sunraymoonbeam @maxpain Sorry for missing this question and not answering this in timely manner. Multi-scale diarization decoder (MSDD) model only supports `diar_infer_telephonic.yaml`. So you cannot use `diar_infer_meeting.yaml` settings. -...

@nithinraok I was not able to get the 4hour long audio samples yet. Once I get access to it, I will work on this PR.