ASR-Wav2vec-Finetune icon indicating copy to clipboard operation
ASR-Wav2vec-Finetune copied to clipboard

Can I use an English dataset for this repo?

Open Shaobo-Z opened this issue 1 year ago • 13 comments

In the source code, you used Vietnamese for training and validation. If I want to fine-tune a model that is in English and has English dataset, is there anything that I should change?

Shaobo-Z avatar Jul 02 '23 06:07 Shaobo-Z

No, you just have to prepare the English dataset

khanld avatar Jul 02 '23 10:07 khanld

This is how my dataset looks like ⬇ image

And this is what I got ⬇. There is changes with train_loss, train_lr,..... However, the train_wer is always 1.0000. image

Checked:

  1. Sample Rate: by using librosa.get_samplerate. I got 16000.
  2. Transcript is correct.
  3. Only modify the file_path and iteration in the configure file.
  4. The pre-trained model is facebook/wave2vec2-base.

I tried multiple ways. However, the result remains the same. Any ideas? Plz.

Shaobo-Z avatar Jul 02 '23 11:07 Shaobo-Z

I can see that your dataset is relatively small, so the number of update steps per epoch is only 5. Have your try a longer run and check if the behavior remains. Take a look at the vocab.json file whether it contains the correct English characters.

khanld avatar Jul 02 '23 15:07 khanld

Encountered the same problem even with larger dataset (91 steps and 20 epochs).

ghosthunterk avatar Jul 21 '23 04:07 ghosthunterk

I have not tried on other language datasets yet. Can you share more information about your dataset, config, tensorboard,…

khanld avatar Jul 21 '23 05:07 khanld

Python 3.8 Pip install all in requirements.txt, with exception of torch 1.7.1 i had to use (conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=11.0 -c pytorch) because I have CUDA 11.4 I tried both vivos dataset and common voice dataset, store them in .txt with panda seperatated by "|" and 2 column: path (path on server) and transcript (encoded utf-8) When I tried to print the pred and label, i got these image

ghosthunterk avatar Jul 21 '23 06:07 ghosthunterk

Audio are already pre-processed to be 16000 sampling rate and .wav format image

ghosthunterk avatar Jul 21 '23 06:07 ghosthunterk

i can see that your model did not converge yet, train loss is still high. Try increase the lr higher for faster training

khanld avatar Jul 21 '23 07:07 khanld

Ping me at mail [email protected] for better debugging since I rarely check the GitHub notifications

khanld avatar Jul 21 '23 07:07 khanld

Ping me at mail [email protected] for better debugging since I rarely check the GitHub notifications

Already, thanks

ghosthunterk avatar Jul 21 '23 07:07 ghosthunterk

Is it possible to get an update on this question? What is the minimum size of the dataset? I want to train the model with a 20mins dataset. Do you think it is possible?


From: ghosthunterk @.> Sent: Friday, July 21, 2023 5:33:53 PM To: khanld/ASR-Wav2vec-Finetune @.> Cc: Shaobo-Z @.>; Author @.> Subject: Re: [khanld/ASR-Wav2vec-Finetune] Can I use an English dataset for this repo? (Issue #7)

Ping me at mail @.@.> for better debugging since I rarely check the GitHub notifications

Already, thanks

— Reply to this email directly, view it on GitHubhttps://github.com/khanld/ASR-Wav2vec-Finetune/issues/7#issuecomment-1645119163, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJHDBZHB2CRHQRPSO6CQOU3XRIWGDANCNFSM6AAAAAAZ3HZPXY. You are receiving this because you authored the thread.Message ID: @.***>

Shaobo-Z avatar Jul 21 '23 07:07 Shaobo-Z

I will take a look at my codes and run some experiments on english datasets and response to you soon @Shaobo-Z

khanld avatar Jul 21 '23 08:07 khanld

image So after having experimented a while, I found that increasing the learning rate (about >1e-5) and set the scheduler max learning rate to >=1e-4 helped the model to actually learn after a while, just be patient.

ghosthunterk avatar Jul 26 '23 16:07 ghosthunterk