ClipBERT icon indicating copy to clipboard operation
ClipBERT copied to clipboard

Reproducing TGIF-QA performance

Open prote376 opened this issue 4 years ago • 4 comments

Hello. Thank you for releasing this code.

Following the setting mentioned in the paper and the process on this github, I am trying to reproduce the TGIF-QA frameQA task.

However, I can't get the performance in the paper although I have tried many different settings.

It seems like that parameters in my config file are different to the original.

Could you share the config file for the TGIF-QA frameQA task you used?

prote376 avatar May 06 '21 11:05 prote376

Hi @prote376,

We have the TGIF-QA frameQA config available here: https://github.com/jayleicn/ClipBERT/blob/main/src/configs/tgif_qa_frameqa_base_resnet50.json. If I remember correctly, we trained with this config on 4 GPUs. If you are using a different number of GPUs, there might be some performance difference.

Best, Jie

jayleicn avatar May 06 '21 12:05 jayleicn

Thank you for quick reply.

At first, I used the config file with 4 GPUs.

Next, I found that some parameters were different to the parameters in the paper.

  • github vs paper
  • max image size: 768 vs 448
  • batch size: 16 vs 32
  • learning rate: 5e-5 vs 1e-4
  • the number of gpus: 4 vs 8

So I tried with the parameters in the paper. In addition, I have tried with few more settings. But I got only 56-58 where N_test=1 and 57-59 where N_test=16. (59.4 and 60.3 in the paper)

Could you check the setting is correct if you have time?

prote376 avatar May 06 '21 13:05 prote376

I have added my coauthor Linjie @linjieli222 who conducted this experiment.

Hi Linjie,

Could you help us to verify the configurations? Thanks!

Best, Jie

jayleicn avatar May 06 '21 13:05 jayleicn

Hi, I got the batch size mismatch under the 'action' setting. The probabaly problem I found in the code maybe is you concate the question and the options in 'n_options' times, which cause the different batch sizes with the visual embeddings. The related codes are in src/dataset/dataset_video_qa.py: text_str_list = flat_list_of_lists( [[d["q_str"] + " " + d["options_str_list"][i] for i in range(self.n_options)] for d in text_examples] ) # (B * n_options, )

Can you help me with this problem? Thanks!

ByZ0e avatar Jan 23 '22 15:01 ByZ0e