Juan Coria
Juan Coria
That is a good idea. I think that would require major changes in `SpeakerMap` because right now it doesn't have a way of knowing who overlaps who. Or maybe it...
Hi @nefastosaturo and thank you :) From a quick look at `SpeechBrainPretrainedSpeakerEmbedding` in pyannote.audio, it looks like the `__call__` method is expecting masks and not weights. It should have a...
I implemented a working version using speechbrain embeddings and it seems to work well with OSP weights as masks, even without normalization. I don't know if this is the same...
I didn't have time or access to a gpu but I'll take a look at that when I have some free time
> @hbredin , using a single wav file as a test, diart pipeline with the speechbrain embeddings on my gpu are computed quite fast (audio file: ~24s, pipeline computation time...
> The diarization pipeline above processes the audio file in 5.550771630001691 seconds. Ok thanks for the info. So if I'm not mistaken this is about 140ms per chunk for the...
@nefastosaturo any news on this? Recently I've been working a lot on the possibility to add custom models (#43), to optimize thresholds (#53) and to run a faster batched inference...
@zaouk we talked about this some days ago
Hi @RahmaYasser, Yes this would be a nice feature to add and possibly related to torchaudio streams (see #27), which is still in beta. I would suggest to implement your...
Hi @bitnom, can you post the full stacktrace of the error? Are you sure that `mic_data` has the correct shape that the segmentation and/or ASR model is expecting?