OFA
OFA copied to clipboard
Hyper Parameters of OFA_HUGE model for finetuning to perform captioning
Thanks for this great paper and for maintaining this repo so well. I need some details on the hyper-parameters for the captioning task.
The files in run_scripts have hyper-parameters details for base and large models in the bash script for training. There is only a evaluation script for ofa_huge model. I finetuned the ofa_huge model on a custom dataset using the parameters in train_caption_stage1.sh file, but it's performance was poor compared to the ofa_large model with same parameters.
Can you please shed some light on it and maybe share the bash file with the original parameters used for ofa_huge.
I think the problem might mainly come from the position embedding of the image. See this:
For a short story, I pretrained the huge model with the resolution of 256*256 first, and continued pretraining with a larger resolution 480*480 but with the above mentioned interpolation. I suggest you trying two solutions to see if the problem is solved:
- Just change the
patch_image_size
to 256, denoting lower resolution - Pass a new parameter
--orig-patch-image-size=256
, which means that you are using interpolation for the image positional embedding. By the way, I have not tried it on caption with only 4 GPUs. My successful attempts are on 32 GPUs, with bz=4 and ga=2 per worker, so the total batch size should be 256.
Later I'll release the bash file after a double check.
BTW the codes are in unify_transformer.py
@JustinLin610 Thanks for the prompt response. I'll look into your suggestions.
@abisekrk Did you have any updates on this? How was the performance?
@willxxy , There was an increase in performance while using lower image resolution, but the performance of OFA large was still much better. Btw all these are stage 1 performance results.
@JustinLin610 I'm using the same parameters I used for OFA large with OFA huge too. Anything else that I should change? If possible can you share the training script?
Hi @JustinLin610, Can you release the bash file for OFA huge training with all the params if possible?
Hi @JustinLin610, Can you release the bash file for OFA huge training with all the params if possible?
Sorry for my late response. You mean finetuning parameters for OFA huge image captioning?
@JustinLin610 Yes. Please share the bash file with params for finetuning the OFA huge model for image captioning.