Hervé BREDIN

Results 270 comments of Hervé BREDIN

What are the alternatives? Feel free to open a PR.

Fine-tuning speaker embedding is currently not implemented as `pyannote` relies on external libraries for that part. You can however tune the clustering threshold to your use case. [This tutorial](https://github.com/pyannote/pyannote-audio/blob/develop/tutorials/adapting_pretrained_pipeline.ipynb) may...

Plaquet's paper comes with a companion repository (https://github.com/FrenchKrab/IS2023-powerset-diarization) that does include a pipeline based on `speechbrain` ECAPA-TDNN.

What you are looking for is speaker separation, not speaker diarization. `pyannote` does not do that... yet... but we are working on it! In the meantime, you might want to...

You may want to try and reduce `pipeline.embedding_batch_size` that [defaults to 32](https://huggingface.co/pyannote/speaker-diarization-3.1/blob/eb9d8dd72c3ae9de0c77346f4254dfb62d861cb3/config.yaml#L8).

`pyannote` relies on `torchaudio` to read audio files. If `torchaudio.load` can load a file, it is supported. If `torchaudio.load` cannot load a file, it is not supported.

You can definitely use `pyannote.audio` to train such a model. See [this tutorial](https://github.com/pyannote/pyannote-audio/blob/develop/tutorials/add_your_own_task.ipynb) However, you'll need labeled training data.

I have no plan in training such a model... but one should never say never ;-)