Xiaosong Zhang

Results 9 comments of Xiaosong Zhang

I think the problem may be in the INPUT setting, it's best to use a scale that keeps the horizontal/vertical ratio in object detection, so I recommend using (MIN_SIZE=800,MAX_SIZE=1333), or...

RetinaNet and FreeAnchor initialize the classifier bias to make it predict lower scores, so the negative loss is very small.

Please check your pytorch version, we need pytorch >= 1.1, as described in [INSTALL.md](https://github.com/zhangxiaosong18/FreeAnchor/blob/master/INSTALL.md)

This may be due to an incorrect `torchvision` version, please use `torchvision 0.2.1`.

Did you follow the INSTALL.md installation steps? `python setup.py build develop` will compile a dynamic link library(*.so) for the python code. If you can‘t find this file in `maskrcnn_benchmark` directory,...

It took about 10 hours to train free_anchor-R-50-FPN_1x with 8 RTX2080Ti

Emu2-Chat is a generalist model, and we use the Generalist Performance from Table 3 of the [CogVLM arXiv paper](https://arxiv.org/pdf/2311.03079.pdf) instead of single-task performance. ![e7d01101-64a3-4314-ba29-6c010629b562](https://github.com/baaivision/Emu/assets/42880203/33a3fd20-d458-40ae-ad42-f34ddf781118)

They are VQAv2 annotation files processed by [LAVIS](https://github.com/salesforce/LAVIS) and can be downloaded from https://storage.googleapis.com/sfr-vision-language-research/LAVIS/datasets/vqav2/vqa_val_eval.json and https://storage.googleapis.com/sfr-vision-language-research/LAVIS/datasets/vqav2/vqa_test.json