davidluciolu
davidluciolu
Hi! It is mentioned in the paper that > We eliminate certain abstract noun phrases that are challenging to recognize in the image, such as “time”, “love”, and “freedom”, to...
Hi! I read in Section3.1 of your paper that > Specifically, we first obtain the “synonymy label” in WordNet2 of each noun in groundtruth captions. And then, we choose nouns...
Hello! Thanks for your work! I tried your DPO framework on LLaVA-1.5-7B with my own preference data. I found it weird that the `rewards_train/chosen` first rises then continues to decrease...
For example, I want to test the Qwen2-VL-2B-Instruct model with some configurations like `max_pixels= 6272000`. How to set this when using lmdeploy? What's more, how to use custom prompt when...
Thanks for your work! I've downloaded LLaVA-Video-178k dataset and I want to pick several **specific types of questions** for my research, according to Fig 3 in your paper. It seems...