OFA icon indicating copy to clipboard operation
OFA copied to clipboard

Hyper Parameters of OFA_HUGE model for finetuning to perform captioning

Open abisekrk opened this issue 2 years ago • 8 comments

Thanks for this great paper and for maintaining this repo so well. I need some details on the hyper-parameters for the captioning task.

The files in run_scripts have hyper-parameters details for base and large models in the bash script for training. There is only a evaluation script for ofa_huge model. I finetuned the ofa_huge model on a custom dataset using the parameters in train_caption_stage1.sh file, but it's performance was poor compared to the ofa_large model with same parameters.

Can you please shed some light on it and maybe share the bash file with the original parameters used for ofa_huge.

abisekrk avatar Aug 24 '22 11:08 abisekrk

I think the problem might mainly come from the position embedding of the image. See this: image For a short story, I pretrained the huge model with the resolution of 256*256 first, and continued pretraining with a larger resolution 480*480 but with the above mentioned interpolation. I suggest you trying two solutions to see if the problem is solved:

  1. Just change the patch_image_size to 256, denoting lower resolution
  2. Pass a new parameter --orig-patch-image-size=256, which means that you are using interpolation for the image positional embedding. By the way, I have not tried it on caption with only 4 GPUs. My successful attempts are on 32 GPUs, with bz=4 and ga=2 per worker, so the total batch size should be 256.

Later I'll release the bash file after a double check.

JustinLin610 avatar Aug 25 '22 01:08 JustinLin610

BTW the codes are in unify_transformer.py

JustinLin610 avatar Aug 25 '22 01:08 JustinLin610

@JustinLin610 Thanks for the prompt response. I'll look into your suggestions.

abisekrk avatar Aug 25 '22 07:08 abisekrk

@abisekrk Did you have any updates on this? How was the performance?

willxxy avatar Sep 15 '22 00:09 willxxy

@willxxy , There was an increase in performance while using lower image resolution, but the performance of OFA large was still much better. Btw all these are stage 1 performance results.

@JustinLin610 I'm using the same parameters I used for OFA large with OFA huge too. Anything else that I should change? If possible can you share the training script?

abisekrk avatar Sep 15 '22 09:09 abisekrk

Hi @JustinLin610, Can you release the bash file for OFA huge training with all the params if possible?

abisekrk avatar Oct 06 '22 06:10 abisekrk

Hi @JustinLin610, Can you release the bash file for OFA huge training with all the params if possible?

Sorry for my late response. You mean finetuning parameters for OFA huge image captioning?

JustinLin610 avatar Nov 14 '22 06:11 JustinLin610

@JustinLin610 Yes. Please share the bash file with params for finetuning the OFA huge model for image captioning.

abisekrk avatar Nov 14 '22 06:11 abisekrk