whisperX icon indicating copy to clipboard operation
whisperX copied to clipboard

Align Models and Fine-Tuning

Open Infinitay opened this issue 1 year ago • 0 comments

I'm trying to use whisperX on the Korean language and came across some issues on how to do so. Since there's no default model, I went over to HF to find a model to use. As expected, there are many models and I'm not too sure which one to choose, especially because some of the recent models trained by slplab are lacking evaluation results. Also, not really sure what they are evaluated on or any other models. Personally I am not familiar with wav2vec2 or the evaluation benchmark it uses let alone what these other trained models are benchmarked against.

Let me get back on tangent, does it matter if the model I select was fine-tuned on wav2vec2-large-xlsr-53? I'm asking because the current default Japanese model was fine-tuned on it and is now the default. For Chinese, another fine-tuned model on wav2vec2-large-xlsr-53 was again selected according to #7.

Do I have to choose a model like slplab/wav2vec2-large-xlsr-53-korean-samsung-54k or could I use thisisHJLee/wav2vec2-large-xls-r-1b-korean-sample5?

Infinitay avatar Dec 22 '22 19:12 Infinitay