pyannote-audio
pyannote-audio copied to clipboard
Is the fine-tuning model applicable to other languages
If I want to use a pipeline for Mandarin, should I start training from scratch or just fine tune the model?
Thank you for your issue. Give us a little time to review it.
PS. You might want to check the FAQ if you haven't done so already.
This is an automated reply, generated by FAQtory
Depends on how much data you've got! I've gotten good results with Japanese with about 66 hours of labeled data.
Depends on how much data you've got! I've gotten good results with Japanese with about 66 hours of labeled data.
@cryptowooser Did you fine-tuned the segmentation model or the whole pipeline? Did you need to fine-tune the embedding model?
I just did the segmentation model, but if there's a guide to finetuning the pipeline or the embedding model somewhere I'd love to see it! I'd love to improve the general performance.
On Tue, Aug 22, 2023 at 4:55 AM Omar Sayed @.***> wrote:
Depends on how much data you've got! I've gotten good results with Japanese with about 66 hours of labeled data.
@cryptowooser https://github.com/cryptowooser Did you fine-tuned the segmentation model or the whole pipeline? Did you need to fine-tune the embedding model?
— Reply to this email directly, view it on GitHub https://github.com/pyannote/pyannote-audio/issues/1357#issuecomment-1686953028, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHZ7G3SBQXNQ6AOGPDQPBX3XWO4M3ANCNFSM6AAAAAAXRQO4K4 . You are receiving this because you were mentioned.Message ID: @.***>
I just did the segmentation model, but if there's a guide to finetuning the pipeline or the embedding model somewhere I'd love to see it! I'd love to improve the general performance. … On Tue, Aug 22, 2023 at 4:55 AM Omar Sayed @.> wrote: Depends on how much data you've got! I've gotten good results with Japanese with about 66 hours of labeled data. @cryptowooser https://github.com/cryptowooser Did you fine-tuned the segmentation model or the whole pipeline? Did you need to fine-tune the embedding model? — Reply to this email directly, view it on GitHub <#1357 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHZ7G3SBQXNQ6AOGPDQPBX3XWO4M3ANCNFSM6AAAAAAXRQO4K4 . You are receiving this because you were mentioned.Message ID: @.>
But, when you did fine-tune the segmentation model you got results better than the pretrained english model on the Japanese language right?
Yes, results were a MASSIVE improvement.
On Mon, Aug 28, 2023 at 6:13 PM Omar Sayed @.***> wrote:
I just did the segmentation model, but if there's a guide to finetuning the pipeline or the embedding model somewhere I'd love to see it! I'd love to improve the general performance. … <#m_-8609094082961718005_> On Tue, Aug 22, 2023 at 4:55 AM Omar Sayed @.> wrote: Depends on how much data you've got! I've gotten good results with Japanese with about 66 hours of labeled data. @cryptowooser https://github.com/cryptowooser https://github.com/cryptowooser https://github.com/cryptowooser Did you fine-tuned the segmentation model or the whole pipeline? Did you need to fine-tune the embedding model? — Reply to this email directly, view it on GitHub <#1357 (comment) https://github.com/pyannote/pyannote-audio/issues/1357#issuecomment-1686953028>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHZ7G3SBQXNQ6AOGPDQPBX3XWO4M3ANCNFSM6AAAAAAXRQO4K4 https://github.com/notifications/unsubscribe-auth/AHZ7G3SBQXNQ6AOGPDQPBX3XWO4M3ANCNFSM6AAAAAAXRQO4K4 . You are receiving this because you were mentioned.Message ID: @.>
But, when you did fine-tune the segmentation model you got results better than the pretrained english model on the Japanese language right?
— Reply to this email directly, view it on GitHub https://github.com/pyannote/pyannote-audio/issues/1357#issuecomment-1695331330, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHZ7G3ROH64OWGMQ6L7EFQDXXROLLANCNFSM6AAAAAAXRQO4K4 . You are receiving this because you were mentioned.Message ID: @.***>
Yes, results were a MASSIVE improvement. … On Mon, Aug 28, 2023 at 6:13 PM Omar Sayed @.> wrote: I just did the segmentation model, but if there's a guide to finetuning the pipeline or the embedding model somewhere I'd love to see it! I'd love to improve the general performance. … <#m_-8609094082961718005_> On Tue, Aug 22, 2023 at 4:55 AM Omar Sayed @.> wrote: Depends on how much data you've got! I've gotten good results with Japanese with about 66 hours of labeled data. @cryptowooser https://github.com/cryptowooser https://github.com/cryptowooser https://github.com/cryptowooser Did you fine-tuned the segmentation model or the whole pipeline? Did you need to fine-tune the embedding model? — Reply to this email directly, view it on GitHub <#1357 (comment) <#1357 (comment)>>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHZ7G3SBQXNQ6AOGPDQPBX3XWO4M3ANCNFSM6AAAAAAXRQO4K4 https://github.com/notifications/unsubscribe-auth/AHZ7G3SBQXNQ6AOGPDQPBX3XWO4M3ANCNFSM6AAAAAAXRQO4K4 . You are receiving this because you were mentioned.Message ID: @.> But, when you did fine-tune the segmentation model you got results better than the pretrained english model on the Japanese language right? — Reply to this email directly, view it on GitHub <#1357 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHZ7G3ROH64OWGMQ6L7EFQDXXROLLANCNFSM6AAAAAAXRQO4K4 . You are receiving this because you were mentioned.Message ID: @.**>
Great, 1 - Could you please provide me with the specific metric you used to evaluate the segmentation model? 2- Provide me the metric values of the segmentation model before and after the fine-tuning on the same test data?
Metric used was DER. It dropped I want to say... 20-25%? From 55 or so to mid 30s. This was the methodology I used.
pyannote-audio/tutorials/adapting_pretrained_pipeline.ipynb at develop · pyannote/pyannote-audio (github.com) https://github.com/pyannote/pyannote-audio/blob/develop/tutorials/adapting_pretrained_pipeline.ipynb
Message ID: @.***>
pyannote/pyannote-audio (github.com)
I see you used the notebook adapting_pretrained_pipeline.ipynb
and not training_a_model.ipynb
In training_a_model.ipynb
He evaluated the pretrained segmentation model with DiscreateDiarizationErrorRate
I thought you used the same methodology for evaluating the segmentation model.
No, I used the methodology described in the linked notebook. It worked pretty well, I was impressed!
I'm interested in figuring out how I can get more gas out of it but I like where it's at.
On Mon, Aug 28, 2023 at 10:37 PM Omar Sayed @.***> wrote:
pyannote/pyannote-audio (github.com)
I see you used the notebook adapting_pretrained_pipeline.ipynb and not training_a_model.ipynb In training_a_model.ipynb He evaluated the pretrained segmentation model with DiscreateDiarizationErrorRate [image: image] https://user-images.githubusercontent.com/32772530/263718434-650e4e6f-d4a7-4e53-93ad-61c12f9400cc.png I thought you used the same methodology for evaluating the segmentation model.
— Reply to this email directly, view it on GitHub https://github.com/pyannote/pyannote-audio/issues/1357#issuecomment-1695716777, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHZ7G3TX56S3QHAMO3NPQ2TXXSNJLANCNFSM6AAAAAAXRQO4K4 . You are receiving this because you were mentioned.Message ID: @.***>
it's
I think you may visit Speechbrain, and fine-tune the speaker embedding model and use it in the diarization pipeline and see the improvement.
Excellent, I would love to try. Is Speechbrain a better choice than pyannote's own embedding models? I haven't looked at the embedding side of things much, so if there's more info out there about how to do this I'd love to hear it.
On Mon, Aug 28, 2023 at 11:21 PM Omar Sayed @.***> wrote:
it's
I think you may visit Speechbrain, and fine-tune the speaker embedding model and use it in the diarization pipeline and see the improvement.
— Reply to this email directly, view it on GitHub https://github.com/pyannote/pyannote-audio/issues/1357#issuecomment-1695790333, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHZ7G3QOURPTGN22D63IR4LXXSSPLANCNFSM6AAAAAAXRQO4K4 . You are receiving this because you were mentioned.Message ID: @.***>
Yes, Actually the current version of the diarization pipeline used the model from speechbrain https://huggingface.co/speechbrain/spkrec-ecapa-voxceleb see their documentation it's easy to follow
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I've gotten good results with Japanese with about 66 hours of labeled data.
@cryptowooser Would you mind sharing the data you used? If you still have it of course, I know that it's been a while.
I've gotten good results with Japanese with about 66 hours of labeled data.
@cryptowooser Me too, Would you mind sharing the data you used?