The results about multi-modal dialogue retrieval on PhotoChat in PaCE
Hello, When i follow the steps of quickstart, i am confused about how to get the final results about multi-modal dialogue retrieval on PhotoChat in PaCE. I think that the evaluation script compute scores of validation dataset, am i right? And when i change the split from 'val' to 'test', it only gets lower scores than ones in the paper. Hope for your help! Best regards.
Hello, why do you think the evaluation script evaluates the results of the validation set? I'm a little confused. Can you provide more details~
Thanks for your reply! When the script is executed, because it is PhotoChat dataset, it will use the "compute_old_irtr_recall()" method to calculate, and then it constructs text dataset:text_dset = pl_module.trainer.datamodule.dms[0].make_no_false_val_dset(). It returns a validation set data:
def make_no_false_val_dset(self, image_only=False, image_list=None, image_dir=None):
if image_list == None:
return self.dataset_cls_no_false(
self.data_dir,
self.val_transform_keys,
split="val",
image_size=self.image_size,
max_text_len=self.max_text_len,
draw_false_image=0,
draw_false_text=0,
image_only=image_only,
max_image_len=self.max_image_len,
use_segment_ids=self.use_segment_ids,
mask_prob = self.mask_prob,
max_pred_len = self.max_pred_len,
whole_word_masking = self.whole_word_masking,
mask_source_words = self.mask_source_words,
max_source_len = self.max_source_len
)
...
Hope for your help~
Oh, I see. Our project is developed based on ViLT, so we used the same tricks [vilt-datamodule] [vilt-coco-dataset]as they do . If you replace photochat_context_dev with photochat_context_test in the class PhotochatDataset(BaseDataset), what result would you get?
Additionally, you might want to try using a longer context when constructing the data. (https://github.com/AlibabaResearch/DAMO-ConvAI/blob/adcb4950b123eb70266201cb5c0e10894658ec97/pace/pace/utils/write_photochat.py#L46)