Jaemin Cho
Jaemin Cho
Hi, the config file is adapted from the original config file of CLIP-RN50 transformer model. (https://github.com/clip-vil/CLIP-ViL/blob/master/CLIP-ViL-Direct/caption/configs/phrase1/transformer.yml). I only edited it with larger batch sizes and fp16 for faster training. Since...
Back then I didn't use wandb, so I don't have log files for that run, sorry.
I just remember that I actually ran the original CLIP-ViL training script to run the MLE model. Could you please run with the same batch size=10 for 25 epochs following...
For multi-gpus, I guess you could get the similar performance with fewer warmup steps, such as 1000 steps.
Here I attach the output.log for the CIDER run. I used the same configuration (8 V100s, 25 batch size at each GPU) as the current config file. [cider_output.log](https://github.com/j-min/CLIP-Caption-Reward/files/9466584/cider_output.log)
It looks like the METOR evaluation is not properly set up in the [language_evaluation package](https://github.com/bckim92/language-evaluation). Have you run `python -c "import language_evaluation; language_evaluation.download('coco')"` as mentioned in [REAMDE #Setup](https://github.com/j-min/VL-T5/blob/main/README.md#setup)?
Just found a bug from data preprocessing file and fixed it. => https://github.com/ctr4si/A-Hierarchical-Latent-Structure-for-Variational-Conversation-Modeling/commit/e0d70c5b19ec724dd404ca3c94335417a4722068 Please check if it works now.
Can you also upload the logs of 1) the last 10 epochs of the above training and 2) the new 12-epoch training as well?