wespeaker
wespeaker copied to clipboard
Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit
When the wespeaker is applied on torch>=2.1, it will output this error: " > [ WARNING : 2024-07-20 17:11:39,248 ] - error to parse id07100/uUtjsdtDOkQ/00327.wav.wav > [ WARNING : 2024-07-20...
We will do the updates for wespeaker in the following weeks: - [ ] Support SSL pretrained frontend such as WavLM - [ ] Support architectures that accept raw waves...
Hi, First of all, I want to thank for your contribution. Today, I use your example to retrain voxceleb/Resnet34. Dataset is default vox1, vox2 (download via your default script utils)...
will there be support to process multiple files in the GPU at a time?
[rank0]: forward(__torch__.torch.nn.modules.container.___torch_mangle_16.Sequential self, Tensor input) -> Tensor: [rank0]: Expected a value of type 'Tensor (inferred)' for argument 'input' but instead found type 'Optional[Tensor]'. [rank0]: Inferred 'input' to be of type...
Hi, I notice hamming window is used instead of the default povey in onnx inference demo https://github.com/wenet-e2e/wespeaker/blob/master/wespeaker/bin/infer_onnx.py#L47 . May I know the reason for using this? Are all models trained...
Hi, I am trying to reproduce the redimnetB2 results. I trained the model and got on Vox-O: 0.7% (no LM, no ASNORM), 0.61% (no LM, with ASNORM), 0.56 (no LM,...
很高的漏检率
您好,我用此代码在一些数据集上进行测试,发现会有很高的漏检率MS,导致较高的DER,需要修改代码的什么地方吗?  
在使用wespeaker的过程中,发现很多时候无法把说话人分离开,比如附件里的这个录音,是一男一女两个人在对话,音色的差别听上去还挺大的,但是最后测试的结果是下面这样的。所以我的问题是,有没有什么参数,比如相似度之类的,可以提升准确率。我仔细看了Speaker这个类,但是没有收获: ``` ('unk', 0.1, 1.9, 0) ('unk', 2.0, 4.1, 0) ('unk', 4.7, 5.7, 0) ('unk', 29.8, 30.3, 0) ('unk', 32.5, 33.2, 0) ('unk', 33.5, 36.1, 0) ('unk', 36.5, 38.9, 0)...
I'm trying to train DINO ssl with my own dataset (1.2M samples) and now the training process is very very slow although my dataset is stored as shard files. This...