LAVIS Discrepancy with BLIP paper results when using PyTorch

Discrepancy with BLIP paper results when using PyTorch > 1.10

Open oscmansan opened this issue 1 year ago • 2 comments

I was trying to reproduce results with BLIP on VQAv2 test-dev and I observed a non-negligible difference between the VQA accuracy obtained using the published checkpoint (77.41%) and the number reported in the paper (78.25%).

These are the steps I followed:

Clone this repo
Install dependencies with pip install .
Create a symlink cache/coco/images pointing to the local copy of the COCO images
Modify lavis/projects/blip/eval/vqav2_eval.yaml as follows:
Run python -m torch.distributed.run --nproc_per_node=4 evaluate.py --cfg-path lavis/projects/blip/eval/vqav2_eval.yaml (note I only have 4 A100 GPUs available)
Submit the test_vqa_result.json file generated in lavis/output/BLIP/VQA/... to EvalAI

After some debugging, I narrowed it down to a discrepancy in PyTorch versions: I was using the latest version (1.13.0), while LAVIS fixes the version to 1.10.0. So there is some change between PyTorch 1.10 and PyTorch 1.13 which causes a performance degradation when loading a checkpoint trained on 1.10. After downgrading the PyTorch version to 1.10.0, I am able to achieve 78.24% VQA accuracy on VQAv2 test-dev, almost the same number reported in the paper.