pyannote-audio
pyannote-audio copied to clipboard
ArcFace embedding task is broken
Describe the bug
The SpeakerEmbedding
/ SupervisedRepresentationLearningWithArcFace
cannot be used anymore for training since commit 0fb42ad08cb92539a37702cb7afcc20dad8871d7. It is missing an implementation of the collate_fn
method.
Is this known/expected? I see that this task is not tested in pyannote-audio/tests/test_train.py
To Reproduce
import pytorch_lightning as pl
from pyannote.database import get_protocol, FileFinder
from pyannote.audio.models.embedding.debug import SimpleEmbeddingModel
from pyannote.audio.tasks import SpeakerEmbedding
protocol = get_protocol(
"Debug.SpeakerDiarization.Debug",
preprocessors={"audio": FileFinder()}
)
emb = SpeakerEmbedding(protocol)
model = SimpleEmbeddingModel(task=emb)
trainer = pl.Trainer(max_epochs=1)
_ = trainer.fit(model)
Thanks for reporting this.
This is indeed expected but should definitely be fixed -- but unfortunately not at the top of my priority list (as I have been using speechbrain speaker embedding instead).
Would you like to contribute a fix? Or at least a test?
Sure, I can give it a try! I suppose restoring the previous behavior relying on default_collate
should be enough?
Also, do you fine tune the speechbrain ecapa embeddings or do you directly use the ones pretrained on VoxCeleb?
Sure, I can give it a try! I suppose restoring the previous behavior relying on
default_collate
should be enough?
"Enough", I am not 100% sure. But it definitely is "needed".
Also, do you fine tune the speechbrain ecapa embeddings or do you directly use the ones pretrained on VoxCeleb?
I use them directly for the time being...
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
@olvb should be fixed in develop
branch.
Thanks! Will give it a try