Xiaosong Zhang
Xiaosong Zhang
I think the problem may be in the INPUT setting, it's best to use a scale that keeps the horizontal/vertical ratio in object detection, so I recommend using (MIN_SIZE=800,MAX_SIZE=1333), or...
RetinaNet and FreeAnchor initialize the classifier bias to make it predict lower scores, so the negative loss is very small.
Please check your pytorch version, we need pytorch >= 1.1, as described in [INSTALL.md](https://github.com/zhangxiaosong18/FreeAnchor/blob/master/INSTALL.md)
This may be due to an incorrect `torchvision` version, please use `torchvision 0.2.1`.
Yes, it require hand-crafted anchors.
Did you follow the INSTALL.md installation steps? `python setup.py build develop` will compile a dynamic link library(*.so) for the python code. If you can‘t find this file in `maskrcnn_benchmark` directory,...
It took about 10 hours to train free_anchor-R-50-FPN_1x with 8 RTX2080Ti
Emu2-Chat is a generalist model, and we use the Generalist Performance from Table 3 of the [CogVLM arXiv paper](https://arxiv.org/pdf/2311.03079.pdf) instead of single-task performance. 
They are VQAv2 annotation files processed by [LAVIS](https://github.com/salesforce/LAVIS) and can be downloaded from https://storage.googleapis.com/sfr-vision-language-research/LAVIS/datasets/vqav2/vqa_val_eval.json and https://storage.googleapis.com/sfr-vision-language-research/LAVIS/datasets/vqav2/vqa_test.json