NeMo
NeMo copied to clipboard
ZeroDivisionError when diarizing a short file
Describe the bug
I am getting a ZeroDivisionError: float division by zero
error when diarizing a very short short file with prior voice activity detection:
File "nemo/collections/asr/models/clustering_diarizer.py", line 467, in diarize return score_labels( File "nemo/collections/asr/metrics/der.py", line 171, in score_labels CER = metric['confusion'] / metric['total'] ZeroDivisionError: float division by zero
I've diarized lots of files of various length and this error seems to occur only on really short files that have identified speech them. I understand that diarization of such short files is fundamentally questionable, but the type of error suggests that this might be an edge case bug worth reporting.
Steps/Code to reproduce bug
Please refer to this google colab notebook to reproduce the bug.
The files to reproduce the bug can be found here.
!apt-get update && apt-get install -y libsndfile1 ffmpeg
!pip install nemo_toolkit['asr']
import os
import torch
import yaml
import json
from omegaconf import OmegaConf
from nemo.collections.asr.models import ClusteringDiarizer
def diarize(workdir: str, rttm_filepath):
manifest_path = os.path.join(workdir, "manifest.json")
output_dir = os.path.join(workdir, "output")
manifest = {
'audio_filepath': '/content/diarization_test_file.wav',
'offset': 0,
'duration': None,
'label': 'infer',
'text': '-',
'num_speakers': None,
'rttm_filepath': rttm_filepath,
'uem_filepath': None,
}
with open('/content/config.yaml', "r") as config_file:
config_dict = yaml.load(config_file, Loader=yaml.FullLoader)
config = OmegaConf.create(config_dict['diarizer'])
config.device = "cuda:0" if torch.cuda.is_available() else "cpu"
config.diarizer.manifest_filepath = manifest_path
config.diarizer.oracle_vad = True
config.diarizer.speaker_embeddings.model_path = 'titanet_large'
config.diarizer.out_dir = output_dir
with open(manifest_path, "w") as manifest_file:
json.dump(manifest, manifest_file)
model = ClusteringDiarizer(cfg=config)
model.diarize()
diarize('/content/', '/content/speech_timestamps.rttm')
[/usr/local/lib/python3.10/dist-packages/nemo/collections/asr/metrics/der.py](https://localhost:8080/#) in score_labels(AUDIO_RTTM_MAP, all_reference, all_hypothesis, collar, ignore_overlap, verbose)
169
170 DER = abs(metric)
--> 171 CER = metric['confusion'] / metric['total']
172 FA = metric['false alarm'] / metric['total']
173 MISS = metric['missed detection'] / metric['total']
ZeroDivisionError: float division by zero
Expected behavior
Diarization should not fail on files of short length.
Environment overview (please complete the following information)
- Environment location: Google Colab
- Method of NeMo install:
apt-get update && apt-get install -y libsndfile1 ffmpeg
pip install nemo_toolkit['asr']
Environment details
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.3 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
2.1.0+cu121
Python 3.10.12
Hi @tango4j, would it be possible for you to have a look at this error? Thanks!
This error seems like happening since the audio is containing 0 second of RTTM target evaluation duration. We cannot reproduce the error unless we have access to the audio input. @RafalCer Could you check the rttm file you provide? if it is containing 0 second duration in RTTM, it will throw out this error. If possible, can you copy and paste the RTTM file for this file?
Thanks for your response!
Please note that you can find the audio, rttm and config here: https://drive.google.com/drive/folders/1e2b14isJ8CQ0NR3mGqHcmkLlo4D2T38E?usp=drive_link.
Please let me know should the link not work.
The content of rttm file is as follows:
SPEAKER 0 1 0.29 0.48 <NA> <NA> <NA> <NA> <NA>
It does have speech, albeit a very short segment, which is too short for diarization either way. However, the error is somewhat inconvenient when diarization is part of a pipeline used for files of any duration. Is it possible to resolve it somehow without modifying the source code?
Thanks for sharing the samples. We have plenty of issue traffic so it will take some time to try the samples and fix it, but we will definitely try this sample to see what is the issue.
Seems like the diarization pipeline currently cannot handle the extremely short samples (0.5 second is shorter than the shortest segment length).
Also, could you please clarify the error is somewhat inconvenient
? Do you mean the way error is calculated is inconvenient?
Thank you so much, I really appreciate your time!
I see, the short segment length might indeed be causing this, as then metric['total']
is probably 0, thus the ZeroDivisionError: float division by zero
.
Could you please suggest what would be a good solution for this - should the evaluation not be called when the sample is too short, or should some value other than 0 be used for the calculation? I will try to look into this and post a PR if I manage to find the time.
By saying the error is somewhat inconvenient
, I meant that the error in this case seems to arise at evaluation of diarization rather than diarization itself (I might be wrong on this one). Perhaps it's possible to instead of failing the diarization on short segments to assign the short segment to one speaker by default?
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
Thank you so much, I really appreciate your time!
I see, the short segment length might indeed be causing this, as then
metric['total']
is probably 0, thus theZeroDivisionError: float division by zero
.Could you please suggest what would be a good solution for this - should the evaluation not be called when the sample is too short, or should some value other than 0 be used for the calculation? I will try to look into this and post a PR if I manage to find the time.
By saying
the error is somewhat inconvenient
, I meant that the error in this case seems to arise at evaluation of diarization rather than diarization itself (I might be wrong on this one). Perhaps it's possible to instead of failing the diarization on short segments to assign the short segment to one speaker by default?
Hi, have you managed to solve this bug? I have the same problem here. It is caused by a wav file like this:
reference RTTM:
SPEAKER e1d3d1e5-fd82-a0da-20de-51a0a558a8ee 1 0.000 1.800 <NA> <NA> 1 <NA> <NA>
SPEAKER e1d3d1e5-fd82-a0da-20de-51a0a558a8ee 1 0.400 1.100 <NA> <NA> 0 <NA> <NA>
predict RTTM: SPEAKER e1d3d1e5-fd82-a0da-20de-51a0a558a8ee 1 0.060 0.270 <NA> <NA> speaker_0 <NA> <NA> SPEAKER e1d3d1e5-fd82-a0da-20de-51a0a558a8ee 1 0.780 0.430 <NA> <NA> speaker_1 <NA> <NA>
When i set config.diarizer.collar = 0, the problem disappear. So i believe that the "collar" of pyannote.metrics.DiarizationErrorRate is larger than the duration of a predicted segment, which caused this bug. Do you have any idea about how to solve this? Simply set collar=0 is a acceptable solution, but the DER will be always larger than the DER computed with collar=0.25*2 by default.
Thank you so much, I really appreciate your time! I see, the short segment length might indeed be causing this, as then
metric['total']
is probably 0, thus theZeroDivisionError: float division by zero
. Could you please suggest what would be a good solution for this - should the evaluation not be called when the sample is too short, or should some value other than 0 be used for the calculation? I will try to look into this and post a PR if I manage to find the time. By sayingthe error is somewhat inconvenient
, I meant that the error in this case seems to arise at evaluation of diarization rather than diarization itself (I might be wrong on this one). Perhaps it's possible to instead of failing the diarization on short segments to assign the short segment to one speaker by default?Hi, have you managed to solve this bug? I have the same problem here. It is caused by a wav file like this:
reference RTTM: SPEAKER e1d3d1e5-fd82-a0da-20de-51a0a558a8ee 1 0.000 1.800 1 SPEAKER e1d3d1e5-fd82-a0da-20de-51a0a558a8ee 1 0.400 1.100 0
predict RTTM: SPEAKER e1d3d1e5-fd82-a0da-20de-51a0a558a8ee 1 0.060 0.270 speaker_0 SPEAKER e1d3d1e5-fd82-a0da-20de-51a0a558a8ee 1 0.780 0.430 speaker_1
When i set config.diarizer.collar = 0, the problem disappear. So i believe that the "collar" of pyannote.metrics.DiarizationErrorRate is larger than the duration of a predicted segment, which caused this bug. Do you have any idea about how to solve this? Simply set collar=0 is a acceptable solution, but the DER will be always larger than the DER computed with collar=0.25*2 by default.
Hi, sorry for such late response.
Sorry, I did not have the time to look into this issue myself. But thank you so much for the tip regarding the collar value. It has indeed solved the issue on short files.