donut icon indicating copy to clipboard operation
donut copied to clipboard

Difficulties finetuning for another language

Open lauraminkova opened this issue 1 year ago • 12 comments

Hi there!

First of all, thank you so much for all of your work and the time put into answering everyone's questions in the Issues section!

I've been trying to finetune Donut for French visual question answering, but have encountered lots of issues.

My initial thought process:

  1. Create French SynthDoG data (No problem here)
  2. Finetune donut-base using SynthDoG_fr data (using this notebook as basis https://github.com/NielsRogge/Transformers-Tutorials/blob/master/Donut/CORD/Fine_tune_Donut_on_a_custom_dataset_(CORD)_with_PyTorch_Lightning.ipynb)
  3. Finetune donut-SynthDog_fr model on French documents for visual question answering (using this notebook as a basis https://github.com/NielsRogge/Transformers-Tutorials/blob/master/Donut/DocVQA/Fine_tune_Donut_on_DocVQA.ipynb)

In order to account for the change in language, I changed the config_fr.yaml for synthdog to have a french font and french corpus (2.4M). I also changed the tokenizer for both finetuning processes to a French one (and checked that it works - it does!). I even read in #11 that maybe I should change the decoder to one that can better handle the French language, so I did that as well.

Despite these changes, I still get rather poor metrics with both pre-training with SynthDoG_fr (the lowest val_edit_distance is ~ 0.74 after 30 epochs) and finetuning on French documents for VQA (though this is unsurprising given the results from the pre-training). The metrics are visibly bad as well, usually predicting giberish.

Am I missing anything? Any help would be greatly appreciated!

lauraminkova avatar Aug 08 '23 15:08 lauraminkova