VL-T5 icon indicating copy to clipboard operation
VL-T5 copied to clipboard

Hyperparameter Tuning Strategies

Open shrutijpalaskar opened this issue 3 years ago • 1 comments

Hi Jaemin,

Thanks for the very interesting paper and releasing your codebase!

I have been working with your codebase for a different multimodal text generation task and observe lower performance with VL-T5 and VL-BART than other similar models. I think this might be a hyperparameter tuning issue. Do you have any advice on which particular parameters might be beneficial to tune? I am currently following the Multi30K settings for the learning rate and number of epochs from Table 14 in your paper.

shrutijpalaskar avatar Jul 26 '21 20:07 shrutijpalaskar

Hi @shrutijpalaskar. Since I had to run all pretraining/finetuning experiments on a 4 x 10GB RTX 2080 ti server (much smaller compared to recent works from big companies), I couldn't try a wide hyperparameter search, which means the current hyperparameters are under-tuned and might be far from optimal. I guess VL-T5/VL-BART model could achieve higher scores on benchmarks with better hyperparameters. In my experiments, I didn't observe much difference when tuning parameters (ex. batch size, learning rate, epochs) during finetuning. I found improvements when using longer pretraining epochs (10epochs -> 30epochs; I didn't have time to explore longer) and bigger backbone architectures (ex. t5-small -> t5-base), which are kinda obvious. What is your target multimodal text generation task?

j-min avatar Jul 26 '21 21:07 j-min