pyannote-audio icon indicating copy to clipboard operation
pyannote-audio copied to clipboard

Is that possible to convert the model to ONNX then use it in C++

Open leohuang2013 opened this issue 2 years ago • 10 comments

Is that possible to convert the model to ONNX then use it in C++ for speaker diarization? Thanks.

leohuang2013 avatar Apr 11 '23 07:04 leohuang2013

We found the following entry in the FAQ which you may find helpful:

Feel free to close this issue if you found an answer in the FAQ. Otherwise, please give us a little time to review.

This is an automated reply, generated by FAQtory

github-actions[bot] avatar Apr 11 '23 07:04 github-actions[bot]

https://github.com/pengzhendong/pyannote-onnx

pengzhendong avatar Jul 20 '23 03:07 pengzhendong

@pengzhendong

I am looking to implement the speaker-diarization of pyannote with ONNX. I've been referring to this link: https://github.com/pengzhendong/pyannote-onnx. However, the repository linked doesn't seem to have the speaker-diarization output implemented.

I want to make the necessary adjustments myself, but pyannote's speaker-diarization operates by loading multiple models. Considering this, I'm unsure how to proceed with the modifications. I would appreciate it if you could provide me with advice or instructions on the specific steps or methods to follow.

kfsky avatar Aug 04 '23 05:08 kfsky

@kfsky Could you provide the link of but pyannote's speaker-diarization operates by loading multiple models?

pengzhendong avatar Aug 04 '23 06:08 pengzhendong

@pengzhendong I have been referring to this notebook: https://github.com/pyannote/pyannote-audio/blob/develop/tutorials/applying_a_pipeline.ipynb. When executing the following section of the notebook, multiple models get downloaded:

Copy code
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization@develop", use_auth_token=True)

Therefore, I believe these multiple models are necessary for the conversion to ONNX. Is my understanding incorrect?

kfsky avatar Aug 04 '23 07:08 kfsky

@kfsky There are two models:

  1. https://huggingface.co/pyannote/segmentation
  2. https://huggingface.co/speechbrain/spkrec-ecapa-voxceleb

The first one is used to segment the audio (pyannote-onnx does the same thing): image The second one is used to get the embeddings of the segments.

pengzhendong avatar Aug 04 '23 07:08 pengzhendong

@pengzhendong

The second one is used to get the embeddings of the segments.

Could you possibly share some ideas on the steps to follow when incorporating the second model into pyannote-onnx?

kfsky avatar Aug 04 '23 08:08 kfsky

@kfsky Please refer this file: https://github.com/pyannote/pyannote-audio/blob/develop/pyannote/audio/pipelines/speaker_diarization.py

pengzhendong avatar Aug 04 '23 08:08 pengzhendong

@kfsky Did you manage to export the whole diarization pipeline to ONNX?

mark95 avatar Aug 22 '23 09:08 mark95

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Feb 18 '24 14:02 stale[bot]

I'm also looking to convert pyannote model to onnx format and then use it from Rust with ort Did anyone manged to use it in c++?

thewh1teagle avatar Jun 29 '24 18:06 thewh1teagle