ArcFace embedding task is broken
Describe the bug
The SpeakerEmbedding / SupervisedRepresentationLearningWithArcFace cannot be used anymore for training since commit 0fb42ad08cb92539a37702cb7afcc20dad8871d7. It is missing an implementation of the collate_fn method.
Is this known/expected? I see that this task is not tested in pyannote-audio/tests/test_train.py
To Reproduce
import pytorch_lightning as pl
from pyannote.database import get_protocol, FileFinder
from pyannote.audio.models.embedding.debug import SimpleEmbeddingModel
from pyannote.audio.tasks import SpeakerEmbedding
protocol = get_protocol(
"Debug.SpeakerDiarization.Debug",
preprocessors={"audio": FileFinder()}
)
emb = SpeakerEmbedding(protocol)
model = SimpleEmbeddingModel(task=emb)
trainer = pl.Trainer(max_epochs=1)
_ = trainer.fit(model)
Thanks for reporting this.
This is indeed expected but should definitely be fixed -- but unfortunately not at the top of my priority list (as I have been using speechbrain speaker embedding instead).
Would you like to contribute a fix? Or at least a test?
Sure, I can give it a try! I suppose restoring the previous behavior relying on default_collate should be enough?
Also, do you fine tune the speechbrain ecapa embeddings or do you directly use the ones pretrained on VoxCeleb?
Sure, I can give it a try! I suppose restoring the previous behavior relying on
default_collateshould be enough?
"Enough", I am not 100% sure. But it definitely is "needed".
Also, do you fine tune the speechbrain ecapa embeddings or do you directly use the ones pretrained on VoxCeleb?
I use them directly for the time being...
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
@olvb should be fixed in develop branch.
Thanks! Will give it a try