wav2vec2-sprint
wav2vec2-sprint copied to clipboard
wav2vec2-xls-r-1b_cv8_it
Hi Jonatas,
I'm trying replicate your performance about to fine-tuned facebook/wav2vec2-xls-r-1b on Italian using the train and validation splits of Common Voice 8.0 but I can't get nearly the same WER.
I used the guide on this link: https://huggingface.co/blog/fine-tune-xlsr-wav2vec2
What are the parameters you use to train the final model?
Following I show you which parameters I used to build model and training_args:
from transformers import Wav2Vec2ForCTC model_base="facebook/wav2vec2-xls-r-1b" model = Wav2Vec2ForCTC.from_pretrained( model_base, activation_dropout=0.05, attention_dropout=0.05, hidden_dropout=0.05, feat_proj_dropout=0.05, final_dropout=0.05, mask_time_prob=0.05, layerdrop=0.05, ctc_loss_reduction="mean", pad_token_id=processor.tokenizer.pad_token_id, vocab_size=len(processor.tokenizer) ) model.freeze_feature_extractor() model_path = "/home/test_w2v2/wav2vec2-xls-r-1b_cv8_it" model.save_pretrained(model_path) from transformers import TrainingArguments repo_name = model_path training_args = TrainingArguments( output_dir=repo_name, group_by_length=True, per_device_train_batch_size=8, gradient_accumulation_steps=8, evaluation_strategy="steps", num_train_epochs=30, fp16=True, gradient_checkpointing=True, save_steps=400, eval_steps=400, logging_steps=400, learning_rate=3e-4, warmup_steps=500, save_total_limit=2, push_to_hub=False )
from transformers import Trainer trainer = Trainer( model=model, data_collator=data_collator, args=training_args, compute_metrics=compute_metrics, train_dataset=common_voice_train, eval_dataset=common_voice_test, tokenizer=processor.feature_extractor )
Are parameters correct or not? Have you any suggestion about it?
Thanks!