visdial-bert Results from the test server doesn't match the repoted results.

Results from the test server doesn't match the repoted results.

Open C0desoda opened this issue 2 years ago • 0 comments

Hello, I used download_preprocessed.sh to download all the pretrained state dicts. Then I used python evaluate.py -n_gpus 8 -start_path ./checkpoints-release/basemodel_dense -save_name bestmodel_plus_dense to get the submission text file and submitted the file to the test server. The result given by server is {"test-std": {"MRR (x 100)": 13.4486998706508, "R@1": 4.075, "R@5": 18.825, "R@10": 34.1, "Mean": 23.001, "NDCG (x 100)": 30.34857283511257}}. which doesn't match the reported results. I ensured the state dict is correctely loaded. I used python evaluate.py -n_gpus 8 -start_path ./checkpoints-release/basemodel_dense_nsp -save_name basemodel_dense_nsp to get another result and submitted the result to the test server. I got this result: {"test-std": {"MRR (x 100)": 9.081483141263188, "R@1": 2.125, "R@5": 10.5, "R@10": 20.7, "Mean": 31.2615, "NDCG (x 100)": 18.93568245023515}}, which doesn't match the reported results, either. What do you think might be the cause of this? Thank you very much.

Sep 29 '22 06:09 C0desoda

visdial-bert visdial-bert copied to clipboard

Results from the test server doesn't match the repoted results.

visdial-bert
visdial-bert copied to clipboard