PEViT Does the repo pick the weights that perform best in val dataset to evaluate in test dataset?

Does the repo pick the weights that perform best in val dataset to evaluate in test dataset?

Open pierowu opened this issue 1 year ago • 2 comments

Thank you for your solid work. Does the repo implement the function that pick the model weights that perform best in val dataset to evaluate in test dataset? From the code below, it seems that the repo directly choose the best results in test dataset as the final results? https://github.com/eric-ai-lab/PEViT/blob/be6fb43ff54adeeffe720c663dd238976070558e/vision_benchmark/evaluation/lora_clip.py#L284-L291

Oct 18 '23 02:10 pierowu

Hi, thanks for the interests! I just been notified you raised the same question in the elevater toolkit https://github.com/Computer-Vision-in-the-Wild/Elevater_Toolkit_IC. So basically the best results on the test set are reported. And the best weights are selected on the test set. This is the same setting used in the elevater toolkit for the fair comparison.

Oct 18 '23 06:10 jkooy

Hi, thanks for the interests! I just been notified you raised the same question in the elevater toolkit https://github.com/Computer-Vision-in-the-Wild/Elevater_Toolkit_IC. So basically the best results on the test set are reported. And the best weights are selected on the test set. This is the same setting used in the elevater toolkit for the fair comparison.

It seems to bring some possibilities to overfit in the test set. However , since the elevater benchmark use this setting, maybe there is no better way but to follow it.

Oct 18 '23 07:10 pierowu

PEViT PEViT copied to clipboard

Does the repo pick the weights that perform best in val dataset to evaluate in test dataset?

PEViT
PEViT copied to clipboard