ClipBERT
ClipBERT copied to clipboard
Reproducing TGIF-QA performance
Hello. Thank you for releasing this code.
Following the setting mentioned in the paper and the process on this github, I am trying to reproduce the TGIF-QA frameQA task.
However, I can't get the performance in the paper although I have tried many different settings.
It seems like that parameters in my config file are different to the original.
Could you share the config file for the TGIF-QA frameQA task you used?
Hi @prote376,
We have the TGIF-QA frameQA config available here: https://github.com/jayleicn/ClipBERT/blob/main/src/configs/tgif_qa_frameqa_base_resnet50.json. If I remember correctly, we trained with this config on 4 GPUs. If you are using a different number of GPUs, there might be some performance difference.
Best, Jie
Thank you for quick reply.
At first, I used the config file with 4 GPUs.
Next, I found that some parameters were different to the parameters in the paper.
- github vs paper
- max image size: 768 vs 448
- batch size: 16 vs 32
- learning rate: 5e-5 vs 1e-4
- the number of gpus: 4 vs 8
So I tried with the parameters in the paper. In addition, I have tried with few more settings. But I got only 56-58 where N_test=1 and 57-59 where N_test=16. (59.4 and 60.3 in the paper)
Could you check the setting is correct if you have time?
I have added my coauthor Linjie @linjieli222 who conducted this experiment.
Hi Linjie,
Could you help us to verify the configurations? Thanks!
Best, Jie
Hi, I got the batch size mismatch under the 'action' setting. The probabaly problem I found in the code maybe is you concate the question and the options in 'n_options' times, which cause the different batch sizes with the visual embeddings. The related codes are in src/dataset/dataset_video_qa.py:
text_str_list = flat_list_of_lists( [[d["q_str"] + " " + d["options_str_list"][i] for i in range(self.n_options)] for d in text_examples] ) # (B * n_options, )
Can you help me with this problem? Thanks!