Yongshuo Zong
Yongshuo Zong
Hi, we developed an ICL benchmark for VLLMs here: https://github.com/ys-zong/VL-ICL. Welcome to try it out. @kennymckormick I wonder if you have a plan to integrate VL-ICL into this very useful...
Hi guys, you can use our implemented codebase for ICL. https://github.com/ys-zong/VL-ICL
FYI. I didn't find a neat way for few-shot BLIP, but I implemented the few-shot inference of many other V-L models here: https://github.com/ys-zong/VL-ICL
Hi, thanks for your interest. We used the exactly same fine-tuning script from the original LLaVA (https://github.com/haotian-liu/LLaVA/blob/main/scripts/v1_5/finetune.sh) and MiniGPT-4 (https://github.com/Vision-CAIR/MiniGPT-4/blob/main/MiniGPTv2_Train.md). For example, for LLaVA fine-tuning, you can first convert our...
Thanks for your reply! Yes, I have cast all the outputs to lowercase. > "truncate the model output to the length of the longest ground truth answer" Does the "longest...
Great! Now I can get the Acc of 27.5% after truncation. Thanks a lot for the help! Still would like to check your implementation to see the last minor difference...
HI, I perhaps didn't put the OL3I in the arguments as this dataset was included later. You can feel free to add it yourself and the implementation should be very...
This should be also straightforward to implement according to the paper and other pre-processing scripts. Let me know if you have any problems.
From a quick look at your code, it seems you didn't use the LLaVA [conversation template](https://github.com/ys-zong/VLGuard/blob/d889f8d04808635aad63148def0e46e4beb87afc/utils/model_utils.py#L17) but directly input the raw texts, which may cause the differences. Can you modify...
You could use our implemented codebase for in-context learning. https://github.com/ys-zong/VL-ICL