wespeaker issues

Unable to load shard

2

When the wespeaker is applied on torch>=2.1, it will output this error: " > [ WARNING : 2024-07-20 17:11:39,248 ] - error to parse id07100/uUtjsdtDOkQ/00327.wav.wav > [ WARNING : 2024-07-20...

mrjunjieli

Updates Plan

We will do the updates for wespeaker in the following weeks: - [ ] Support SSL pretrained frontend such as WavLM - [ ] Support architectures that accept raw waves...

wsstriving

[Question] Is this training log normal or not?

10

Hi, First of all, I want to thank for your contribution. Today, I use your example to retrain voxceleb/Resnet34. Dataset is default vox1, vox2 (download via your default script utils)...

hungnvk54

Batch processing of files?

3

will there be support to process multiple files in the GPU at a time?

sleepingcat4

Expected a value of type 'Tensor (inferred)' for argument 'input' but instead found type 'Optional[Tensor]'.

1

[rank0]: forward(__torch__.torch.nn.modules.container.___torch_mangle_16.Sequential self, Tensor input) -> Tensor: [rank0]: Expected a value of type 'Tensor (inferred)' for argument 'input' but instead found type 'Optional[Tensor]'. [rank0]: Inferred 'input' to be of type...

gray5wolf

using hamming window for onnx inference

1

Hi, I notice hamming window is used instead of the default povey in onnx inference demo https://github.com/wenet-e2e/wespeaker/blob/master/wespeaker/bin/infer_onnx.py#L47 . May I know the reason for using this? Are all models trained...

jie-chen

Redimnet Large Margin not improving the results

4

Hi, I am trying to reproduce the redimnetB2 results. I trained the model and got on Vox-O: 0.7% (no LM, no ASNORM), 0.61% (no LM, with ASNORM), 0.56 (no LM,...

GoldMan6

很高的漏检率

4

您好，我用此代码在一些数据集上进行测试，发现会有很高的漏检率MS，导致较高的DER，需要修改代码的什么地方吗？ ![Image](https://github.com/user-attachments/assets/a2980ac3-68bf-4cc0-a89e-bf3cad78d244) ![Image](https://github.com/user-attachments/assets/671a0520-4117-4920-8069-b620fec6e3af)

Rainingiii

如何提升说话人识别的准确率

4

在使用wespeaker的过程中，发现很多时候无法把说话人分离开，比如附件里的这个录音，是一男一女两个人在对话，音色的差别听上去还挺大的，但是最后测试的结果是下面这样的。所以我的问题是，有没有什么参数，比如相似度之类的，可以提升准确率。我仔细看了Speaker这个类，但是没有收获： ``` ('unk', 0.1, 1.9, 0) ('unk', 2.0, 4.1, 0) ('unk', 4.7, 5.7, 0) ('unk', 29.8, 30.3, 0) ('unk', 32.5, 33.2, 0) ('unk', 33.5, 36.1, 0) ('unk', 36.5, 38.9, 0)...

chenfuckthesky

IO Bottleneck while loading data

1

I'm trying to train DINO ssl with my own dataset (1.2M samples) and now the training process is very very slow although my dataset is stored as shard files. This...

chnk58hoang

wespeaker
wespeaker copied to clipboard

Metadata

Unable to load shard

Updates Plan

[Question] Is this training log normal or not?

Batch processing of files?

Expected a value of type 'Tensor (inferred)' for argument 'input' but instead found type 'Optional[Tensor]'.

using hamming window for onnx inference

Redimnet Large Margin not improving the results

很高的漏检率

如何提升说话人识别的准确率

IO Bottleneck while loading data

← Metadata

Owner

Metadata

wespeaker wespeaker copied to clipboard

Metadata

← Metadata

Owner

Metadata

wespeaker
wespeaker copied to clipboard