jetyingjia

Results 3 issues of jetyingjia

I don't find any data_augmentation in this code. ex:scale、 shift、motion blur; which are very important in training! Am I miss some code? Thank you fot this project!

Great work! I am confused with Tab .6 result, the performance is Alpha-CLIP with LLaVA-1.5 or fine-tune this model with vicuna-7b on these datasets(RefCOCOg or VG)?

Thanks for your great work! In your project, the caption branch is trained only on VG data. This caption ability may be poor than the modal using large caption data...