LLaVA-Align
LLaVA-Align copied to clipboard
The same model_kwargs and model_kwargs_cd.
Hi. I have noticed that you adopted the code of VCD as your base code. But I found that they use the same model_kwargs and model_kwargs_cd to generate tokens. I am confused because past_key_values term is also incorporated in model_kwargs, which means the same past_key_values term is used in original and distorted images as the visual inputs. Is that operation correct?
Thanks for pointing that out! I'm not entirely clear on the issue you're describing, though. Can you explain the problem a bit more? I'd really appreciate it if you could help me understand why this operation might be incorrect.