whisper.cpp icon indicating copy to clipboard operation
whisper.cpp copied to clipboard

Diarization

Open ggerganov opened this issue 1 year ago • 0 comments

Some unsuccessful experiments with audio embedding clustering

Tried to apply C-means fuzzy clustering on:

  • embeddings after the initial convolution in the encoder
  • self KV embeddings from each encoder layer
  • KQV embeddings from each encoder layer
  • embeddings from the last encoder layer
  • cross KV embeddings of each decoder layer

Instead of clustering the full embedding dimensions, first reduce dimensionality using SVD:

  • decompose the embeddings E = USV
  • compute singular vectors U' = US
  • project E on U' and take the top few coordinates

ggerganov avatar Nov 08 '22 18:11 ggerganov