diart
diart copied to clipboard
Maintaining state across file audio source chunks
Hello,
I am looking for a way to do chunk-based inference instead of streaming inference using audio files. The issue now is that each file audio will have new inference and thus new state (new speaker embeddings) which is unwanted behaviour for my program.
How should I do achieve wanted behaviour of doing inference on larger chunks of audio (such as 20 second) and keeping the pipeline state across ?
Hi @Aduomas, given your description, do you actually require a streaming pipeline? It looks like pyannote.audio could achieve what you want