diart
diart copied to clipboard
following to Feature Request: Implementing Persistent Speaker Embeddings Across Conversations #227
Hey @juanmc2005,
Following to the issue #227,
- I have implemented speaker centroids as list of embeddings without mapping, assuming speaker_0 == centroid 0 inside SpeakerDiarization
- added a better version of RedisWriter
- It looks like importing redis is unnessessary if we are passing a redis_client object into RedisWriter
Thank you!
that's how I run this
from diart import SpeakerDiarization
from diart.sources import FileAudioSource
from diart.inference import StreamingInference
from diart.sinks import RedisWriter
pipeline = SpeakerDiarization(return_embeddings=True) # return_embeddings
audio_file_path = path # Replace with the path to your audio file
sample_rate = 16000
file_source = FileAudioSource(audio_file_path,sample_rate) # Use FileAudioSource
inference = StreamingInference(pipeline, file_source, do_plot=False)
inference.attach_observers(RedisWriter(file_source.uri, redis_client)) # RedisWriter instead of RTTMWriter
prediction = inference()
Centroid setting for the beginning of a conversation is missing in the current code.
Do you have specific use case for centroids setting, do you find it helpful?
@DmitriyG228 if we can't set speaker centroids in the beginning, then this is not implementing "persistent speaker embeddings across conversations". Maybe I'm not understanding the use case very well, could you clarify this?
I have a similar requirement, I need to recognize the "moderator" in a conversation, I am sure the moderator speaks for the first 30 seconds, I do not need the recognizing of the other speakers, but I need to be sure when the moderator is speaking again. A sort of "fix this centroid forever", the others I do not care.
There is any way to do such operation in the current implementation? There is the possibility to store the "embeddng" of a voice and just recognize such speaker during a conversation? The others speakers can be wrong I'll not use that.
@vtontodonato that would be an interesting feature to add, but I think it's unrelated to this PR. If you're up for it, I would suggest adding a freeze_centroids(centroids: list[int]) method to OnlineSpeakerClustering, where you would simply keep track of the "frozen" centroids and prevent their updates in identify(). I would gladly merge a PR with this feature!
I'll give a look at such methods ... and send you modification I'll be able to do to fit DIART in my scenario.