fairseq icon indicating copy to clipboard operation
fairseq copied to clipboard

How did you train the k-means clustering model on the HuBERT model?

Open Remaxic opened this issue 11 months ago • 1 comments

❓ Questions and Help

My question

Hello, due to my downstream task requirements, I need to perform k-means clustering on the output of Contentvec model, that has the same structure as the HuBERT model but with a different training idea. I have performed feature extraction on my dataset on Contentvec and learnt a clustering model using the code you provided. However I found the clustering to be far less effective than the clustering model you provided for HuBERT. image

Do you do any special treatment of the features (such as dimensionality reduction) before training the clustering model? Or maybe my dataset is small in size (7430431* 768)? Or if you can make valuable suggestions for my clustering, I would appreciate it!

The code I have tried for clustering:

image

Remaxic avatar Mar 16 '24 10:03 Remaxic

I have a similar question while using pre-trained HuBERT, K-means, and unit-vocoder provided by you, producing good sound. But if I am training k-means clustering on Libirspeech data https://keithito.com/LJ-Speech-Dataset/, which have around 13k audio samples and synthesizing .wav using pre-trained unit-vocoder provided by you, not outputting good sound.

Questions that I want to ask are: On which data available, pre-trained k-means are trained? What are the hyperparameters, such as epochs, batch size, etc.? Are there any other important things that are not mentioned in the paper and required to train the k-means?

Thanks in advance

mahendraphd avatar Jul 08 '24 11:07 mahendraphd