memory optimizations for pyannote.audio.core.inference.Inference.aggregate()
While diarizing long audio recordings (>6 hours), I noticed very high memory usage, upwards of 30GB.
I tracked the spike to pyannote.audio.core.inference.Inference.aggregate(), which was initializing several very large tensors.
With this PR, RAM usage is reduced by 10 - 15 GB for long audio files in my tests. I have not tested extensively, but I do not believe this impacts accuracy or speed.
I did have one question related to one of the commits,
currently, frames is recreated only so that it has the same start as chunks, but from my understanding, there are no cases where chunks.start and frames.start would be anything other than 0.0.
Is this a correct assumption? Otherwise, frames should be reinitialized.
Now, the whole speaker diarization pipeline does not peak past 20GB of RAM for a 9hr recording. this is constrained by both Inference.aggregate and scipy.cluster.hierarchy.linkage in the AgglomerativeClustering pipeline.
rebased the changes onto most recent develop, and then fixed an incorrect git authorship config on my end
rebased and added back the frames section.
Merged! 🎉 Thanks a lot for your contribution. Will be part of next release.
Awesome! I really appreciate your work. pyannote has become an invaluable tool, so I'm glad I can give back in my small way.
I'd love to know more about how pyannote impacts your work. Feel free to drop me an email!