pyannote-audio icon indicating copy to clipboard operation
pyannote-audio copied to clipboard

memory optimizations for pyannote.audio.core.inference.Inference.aggregate()

Open benniekiss opened this issue 9 months ago • 2 comments

While diarizing long audio recordings (>6 hours), I noticed very high memory usage, upwards of 30GB. I tracked the spike to pyannote.audio.core.inference.Inference.aggregate(), which was initializing several very large tensors.

With this PR, RAM usage is reduced by 10 - 15 GB for long audio files in my tests. I have not tested extensively, but I do not believe this impacts accuracy or speed.

I did have one question related to one of the commits,

currently, frames is recreated only so that it has the same start as chunks, but from my understanding, there are no cases where chunks.start and frames.start would be anything other than 0.0.

Is this a correct assumption? Otherwise, frames should be reinitialized.

Now, the whole speaker diarization pipeline does not peak past 20GB of RAM for a 9hr recording. this is constrained by both Inference.aggregate and scipy.cluster.hierarchy.linkage in the AgglomerativeClustering pipeline.

benniekiss avatar May 17 '24 22:05 benniekiss