diart icon indicating copy to clipboard operation
diart copied to clipboard

following to Feature Request: Implementing Persistent Speaker Embeddings Across Conversations #227

Open DmitriyG228 opened this issue 1 year ago • 6 comments

Hey @juanmc2005,

Following to the issue #227,

  • I have implemented speaker centroids as list of embeddings without mapping, assuming speaker_0 == centroid 0 inside SpeakerDiarization
  • added a better version of RedisWriter
  • It looks like importing redis is unnessessary if we are passing a redis_client object into RedisWriter

Thank you!

DmitriyG228 avatar Jan 04 '24 19:01 DmitriyG228

that's how I run this

from diart import SpeakerDiarization
from diart.sources import FileAudioSource
from diart.inference import StreamingInference
from diart.sinks import  RedisWriter

pipeline = SpeakerDiarization(return_embeddings=True) # return_embeddings

audio_file_path = path  # Replace with the path to your audio file
sample_rate = 16000
file_source = FileAudioSource(audio_file_path,sample_rate)  # Use FileAudioSource

inference = StreamingInference(pipeline, file_source, do_plot=False)
inference.attach_observers(RedisWriter(file_source.uri, redis_client)) # RedisWriter instead of RTTMWriter

prediction = inference()

DmitriyG228 avatar Jan 04 '24 20:01 DmitriyG228

Centroid setting for the beginning of a conversation is missing in the current code.

Do you have specific use case for centroids setting, do you find it helpful?

DmitriyG228 avatar Jan 06 '24 11:01 DmitriyG228

@DmitriyG228 if we can't set speaker centroids in the beginning, then this is not implementing "persistent speaker embeddings across conversations". Maybe I'm not understanding the use case very well, could you clarify this?

juanmc2005 avatar Feb 02 '24 15:02 juanmc2005

I have a similar requirement, I need to recognize the "moderator" in a conversation, I am sure the moderator speaks for the first 30 seconds, I do not need the recognizing of the other speakers, but I need to be sure when the moderator is speaking again. A sort of "fix this centroid forever", the others I do not care.

There is any way to do such operation in the current implementation? There is the possibility to store the "embeddng" of a voice and just recognize such speaker during a conversation? The others speakers can be wrong I'll not use that.

vtontodonato avatar Mar 07 '24 10:03 vtontodonato

@vtontodonato that would be an interesting feature to add, but I think it's unrelated to this PR. If you're up for it, I would suggest adding a freeze_centroids(centroids: list[int]) method to OnlineSpeakerClustering, where you would simply keep track of the "frozen" centroids and prevent their updates in identify(). I would gladly merge a PR with this feature!

juanmc2005 avatar Mar 08 '24 10:03 juanmc2005

I'll give a look at such methods ... and send you modification I'll be able to do to fit DIART in my scenario.

vtontodonato avatar Mar 08 '24 10:03 vtontodonato