pyannote-audio Is the fine-tuning model applicable to other languages

If I want to use a pipeline for Mandarin, should I start training from scratch or just fine tune the model?

May 01 '23 07:05 wenzhengchang

Thank you for your issue. Give us a little time to review it.

PS. You might want to check the FAQ if you haven't done so already.

This is an automated reply, generated by FAQtory

May 01 '23 07:05 github-actions[bot]

Depends on how much data you've got! I've gotten good results with Japanese with about 66 hours of labeled data.

Aug 15 '23 07:08 cryptowooser

Depends on how much data you've got! I've gotten good results with Japanese with about 66 hours of labeled data.

@cryptowooser Did you fine-tuned the segmentation model or the whole pipeline? Did you need to fine-tune the embedding model?

Aug 21 '23 19:08 omarsayed7

I just did the segmentation model, but if there's a guide to finetuning the pipeline or the embedding model somewhere I'd love to see it! I'd love to improve the general performance.

On Tue, Aug 22, 2023 at 4:55 AM Omar Sayed @.***> wrote:

Depends on how much data you've got! I've gotten good results with Japanese with about 66 hours of labeled data.

@cryptowooser https://github.com/cryptowooser Did you fine-tuned the segmentation model or the whole pipeline? Did you need to fine-tune the embedding model?

— Reply to this email directly, view it on GitHub https://github.com/pyannote/pyannote-audio/issues/1357#issuecomment-1686953028, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHZ7G3SBQXNQ6AOGPDQPBX3XWO4M3ANCNFSM6AAAAAAXRQO4K4 . You are receiving this because you were mentioned.Message ID: @.***>

Aug 27 '23 13:08 cryptowooser

I just did the segmentation model, but if there's a guide to finetuning the pipeline or the embedding model somewhere I'd love to see it! I'd love to improve the general performance. … On Tue, Aug 22, 2023 at 4:55 AM Omar Sayed @.> wrote: Depends on how much data you've got! I've gotten good results with Japanese with about 66 hours of labeled data. @cryptowooser https://github.com/cryptowooser Did you fine-tuned the segmentation model or the whole pipeline? Did you need to fine-tune the embedding model? — Reply to this email directly, view it on GitHub <#1357 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHZ7G3SBQXNQ6AOGPDQPBX3XWO4M3ANCNFSM6AAAAAAXRQO4K4 . You are receiving this because you were mentioned.Message ID: @.>

But, when you did fine-tune the segmentation model you got results better than the pretrained english model on the Japanese language right?

Aug 28 '23 09:08 omarsayed7

Yes, results were a MASSIVE improvement.

On Mon, Aug 28, 2023 at 6:13 PM Omar Sayed @.***> wrote:

I just did the segmentation model, but if there's a guide to finetuning the pipeline or the embedding model somewhere I'd love to see it! I'd love to improve the general performance. … <#m_-8609094082961718005_> On Tue, Aug 22, 2023 at 4:55 AM Omar Sayed @.> wrote: Depends on how much data you've got! I've gotten good results with Japanese with about 66 hours of labeled data. @cryptowooser https://github.com/cryptowooser https://github.com/cryptowooser https://github.com/cryptowooser Did you fine-tuned the segmentation model or the whole pipeline? Did you need to fine-tune the embedding model? — Reply to this email directly, view it on GitHub <#1357 (comment) https://github.com/pyannote/pyannote-audio/issues/1357#issuecomment-1686953028>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHZ7G3SBQXNQ6AOGPDQPBX3XWO4M3ANCNFSM6AAAAAAXRQO4K4 https://github.com/notifications/unsubscribe-auth/AHZ7G3SBQXNQ6AOGPDQPBX3XWO4M3ANCNFSM6AAAAAAXRQO4K4 . You are receiving this because you were mentioned.Message ID: @.>

But, when you did fine-tune the segmentation model you got results better than the pretrained english model on the Japanese language right?

— Reply to this email directly, view it on GitHub https://github.com/pyannote/pyannote-audio/issues/1357#issuecomment-1695331330, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHZ7G3ROH64OWGMQ6L7EFQDXXROLLANCNFSM6AAAAAAXRQO4K4 . You are receiving this because you were mentioned.Message ID: @.***>

Aug 28 '23 09:08 cryptowooser

Yes, results were a MASSIVE improvement. … On Mon, Aug 28, 2023 at 6:13 PM Omar Sayed @.> wrote: I just did the segmentation model, but if there's a guide to finetuning the pipeline or the embedding model somewhere I'd love to see it! I'd love to improve the general performance. … <#m_-8609094082961718005_> On Tue, Aug 22, 2023 at 4:55 AM Omar Sayed @.> wrote: Depends on how much data you've got! I've gotten good results with Japanese with about 66 hours of labeled data. @cryptowooser https://github.com/cryptowooser https://github.com/cryptowooser https://github.com/cryptowooser Did you fine-tuned the segmentation model or the whole pipeline? Did you need to fine-tune the embedding model? — Reply to this email directly, view it on GitHub <#1357 (comment) <#1357 (comment)>>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHZ7G3SBQXNQ6AOGPDQPBX3XWO4M3ANCNFSM6AAAAAAXRQO4K4 https://github.com/notifications/unsubscribe-auth/AHZ7G3SBQXNQ6AOGPDQPBX3XWO4M3ANCNFSM6AAAAAAXRQO4K4 . You are receiving this because you were mentioned.Message ID: @.> But, when you did fine-tune the segmentation model you got results better than the pretrained english model on the Japanese language right? — Reply to this email directly, view it on GitHub <#1357 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHZ7G3ROH64OWGMQ6L7EFQDXXROLLANCNFSM6AAAAAAXRQO4K4 . You are receiving this because you were mentioned.Message ID: @.**>

Great, 1 - Could you please provide me with the specific metric you used to evaluate the segmentation model? 2- Provide me the metric values of the segmentation model before and after the fine-tuning on the same test data?

Aug 28 '23 13:08 omarsayed7

Metric used was DER. It dropped I want to say... 20-25%? From 55 or so to mid 30s. This was the methodology I used.

pyannote-audio/tutorials/adapting_pretrained_pipeline.ipynb at develop · pyannote/pyannote-audio (github.com) https://github.com/pyannote/pyannote-audio/blob/develop/tutorials/adapting_pretrained_pipeline.ipynb

Message ID: @.***>

Aug 28 '23 13:08 cryptowooser

pyannote/pyannote-audio (github.com)

I see you used the notebook adapting_pretrained_pipeline.ipynb and not training_a_model.ipynb In training_a_model.ipynb He evaluated the pretrained segmentation model with DiscreateDiarizationErrorRate I thought you used the same methodology for evaluating the segmentation model.

Aug 28 '23 13:08 omarsayed7

No, I used the methodology described in the linked notebook. It worked pretty well, I was impressed!

I'm interested in figuring out how I can get more gas out of it but I like where it's at.

On Mon, Aug 28, 2023 at 10:37 PM Omar Sayed @.***> wrote:

pyannote/pyannote-audio (github.com)

I see you used the notebook adapting_pretrained_pipeline.ipynb and not training_a_model.ipynb In training_a_model.ipynb He evaluated the pretrained segmentation model with DiscreateDiarizationErrorRate [image: image] https://user-images.githubusercontent.com/32772530/263718434-650e4e6f-d4a7-4e53-93ad-61c12f9400cc.png I thought you used the same methodology for evaluating the segmentation model.

— Reply to this email directly, view it on GitHub https://github.com/pyannote/pyannote-audio/issues/1357#issuecomment-1695716777, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHZ7G3TX56S3QHAMO3NPQ2TXXSNJLANCNFSM6AAAAAAXRQO4K4 . You are receiving this because you were mentioned.Message ID: @.***>

Aug 28 '23 14:08 cryptowooser

it's

I think you may visit Speechbrain, and fine-tune the speaker embedding model and use it in the diarization pipeline and see the improvement.

Aug 28 '23 14:08 omarsayed7

Excellent, I would love to try. Is Speechbrain a better choice than pyannote's own embedding models? I haven't looked at the embedding side of things much, so if there's more info out there about how to do this I'd love to hear it.

On Mon, Aug 28, 2023 at 11:21 PM Omar Sayed @.***> wrote:

it's

I think you may visit Speechbrain, and fine-tune the speaker embedding model and use it in the diarization pipeline and see the improvement.

— Reply to this email directly, view it on GitHub https://github.com/pyannote/pyannote-audio/issues/1357#issuecomment-1695790333, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHZ7G3QOURPTGN22D63IR4LXXSSPLANCNFSM6AAAAAAXRQO4K4 . You are receiving this because you were mentioned.Message ID: @.***>

Aug 28 '23 15:08 cryptowooser

Yes, Actually the current version of the diarization pipeline used the model from speechbrain https://huggingface.co/speechbrain/spkrec-ecapa-voxceleb see their documentation it's easy to follow

Aug 28 '23 16:08 omarsayed7

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Feb 24 '24 21:02 stale[bot]

I've gotten good results with Japanese with about 66 hours of labeled data.

@cryptowooser Would you mind sharing the data you used? If you still have it of course, I know that it's been a while.

Apr 08 '24 03:04 Juno-T

I've gotten good results with Japanese with about 66 hours of labeled data.

@cryptowooser Me too, Would you mind sharing the data you used?

Apr 25 '24 04:04 objects76

pyannote-audio pyannote-audio copied to clipboard

Is the fine-tuning model applicable to other languages

pyannote-audio
pyannote-audio copied to clipboard