i-Code icon indicating copy to clipboard operation
i-Code copied to clipboard

Finetuning on InfographicVQA

Open Caixin89 opened this issue 1 year ago • 12 comments

I was unable to achieve the result shown in the UDOP paper.

I used the udop-unimodel-large-224 checkpoint.

My ANLS score is 0.407903. This is nowhere near 0.461 as shown in the table below taken from the paper.

image

Since I noticed that the batch size, warmup steps and weight decay given in https://github.com/microsoft/i-Code/blob/main/i-Code-Doc/scripts/finetune_duebenchmark.sh is different from reported in the paper, I also tried changing the finetuning script to use the paper's settings.

drawing

Lastly, I also tried adding the task prompt prefix since it is not done so in the existing code. I followed approach from https://github.com/microsoft/i-Code/issues/71#issuecomment-1623201208

Results of the 3 different finetuning configurations:

Task prefix Hyperparameter settings ANLS Score
No Unchanged finetuning script 0.407903
No Paper's settings 0.40174
Yes Unchanged finetuning script 0.408355

Other changes I made:

  • Change to use pytorch's AdamW, based from https://github.com/microsoft/i-Code/issues/63#issuecomment-1608019905

    Within baselines-master in due-benchmark repo:

    • Apply fix from https://github.com/due-benchmark/baselines/issues/7#issue-1638167863
    • in baselines-master/benchmarker/data/utils.py, I changed dtype of label_name from U100 to U1024 to prevent truncation of questions during display

Please assist

Caixin89 avatar Feb 16 '24 02:02 Caixin89

May I know if the results shown in table 8 above is validation set or test set scores?

Caixin89 avatar Feb 16 '24 02:02 Caixin89

table 8 shows validation results. may I know how many epochs have you run the model and what checkpoint did you use?

zinengtang avatar Feb 17 '24 08:02 zinengtang

4 epochs for the 2 runs that use unchanged finetuning script 5 epochs when I changed the finetuning script to paper's settings

The last epochs are automatically decided based on early_stopping_patience=20.

Caixin89 avatar Feb 19 '24 02:02 Caixin89

I am assuming you are using the last checkpoint the run generated instead of intermediate checkpoint? If so, try using more epochs. If it still doesn't work, I will provide finetuned checkpoint to see if the issue is on the evaluation script?

zinengtang avatar Feb 19 '24 09:02 zinengtang

Sure, I can try that. In the mean while, could you share what is the number of epochs you have used for finetuning?

Caixin89 avatar Feb 20 '24 01:02 Caixin89

image The above is a plot of the validation loss against training steps. The validaton loss is increasing consistently acorss training steps.

Is this expected?

Caixin89 avatar Feb 20 '24 04:02 Caixin89

I was unable to achieve the result shown in the UDOP paper.

I used the udop-unimodel-large-224 checkpoint.

My ANLS score is 0.407903. This is nowhere near 0.461 as shown in the table below taken from the paper.

image

Since I noticed that the batch size, warmup steps and weight decay given in https://github.com/microsoft/i-Code/blob/main/i-Code-Doc/scripts/finetune_duebenchmark.sh is different from reported in the paper, I also tried changing the finetuning script to use the paper's settings.

drawing Lastly, I also tried adding the task prompt prefix since it is not done so in the existing code. I followed approach from [#71 (comment)](https://github.com/microsoft/i-Code/issues/71#issuecomment-1623201208)

Results of the 3 different finetuning configurations:

Task prefix Hyperparameter settings ANLS Score No Unchanged finetuning script 0.407903 No Paper's settings 0.40174 Yes Unchanged finetuning script 0.408355 Other changes I made:

Please assist

May i ask how you have implemented ANLS metric for the task?

Pietro1999IT avatar Feb 27 '24 09:02 Pietro1999IT

I was unable to achieve the result shown in the UDOP paper. I used the udop-unimodel-large-224 checkpoint. My ANLS score is 0.407903. This is nowhere near 0.461 as shown in the table below taken from the paper. image Since I noticed that the batch size, warmup steps and weight decay given in https://github.com/microsoft/i-Code/blob/main/i-Code-Doc/scripts/finetune_duebenchmark.sh is different from reported in the paper, I also tried changing the finetuning script to use the paper's settings. drawing Lastly, I also tried adding the task prompt prefix since it is not done so in the existing code. I followed approach from #71 (comment) Results of the 3 different finetuning configurations: Task prefix Hyperparameter settings ANLS Score No Unchanged finetuning script 0.407903 No Paper's settings 0.40174 Yes Unchanged finetuning script 0.408355 Other changes I made:

Please assist

May i ask how you have implemented ANLS metric for the task?

should be in this repo https://github.com/due-benchmark/evaluator/tree/master

yuanzheng625 avatar Feb 27 '24 18:02 yuanzheng625

Yes, I used ANLS from https://github.com/due-benchmark/evaluator/tree/master.

Caixin89 avatar Mar 04 '24 02:03 Caixin89

I am assuming you are using the last checkpoint the run generated instead of intermediate checkpoint? If so, try using more epochs. If it still doesn't work, I will provide finetuned checkpoint to see if the issue is on the evaluation script?

I have tried with 10 epochs and my ANLS is still ~0.41. Am I supposed to finetune with even more epochs?

Could you provide me with your finetuned checkpoint?

Caixin89 avatar Mar 04 '24 02:03 Caixin89

Also I would like to double check that the 46.1 ANLS score is indeed based on fine-tuning of udop-unimodel-large-224 checkpoint without additional supervised pre-training.

Correct?

Caixin89 avatar Mar 04 '24 03:03 Caixin89

I am assuming you are using the last checkpoint the run generated instead of intermediate checkpoint? If so, try using more epochs. If it still doesn't work, I will provide finetuned checkpoint to see if the issue is on the evaluation script?

I have tried with 10 epochs and my ANLS is still ~0.41. Am I supposed to finetune with even more epochs?

Could you provide me with your finetuned checkpoint?

@zinengtang Any updates?

Caixin89 avatar Apr 08 '24 04:04 Caixin89