Video-LLaMA
Video-LLaMA copied to clipboard
Difference between the Self-implemented BLIP2 vs HF version?
Hi, thanks for the great work!
While reading the code, I noticed that you have used self-implemented version of BLIP and BERT etc. as oppose to directly importing the corresponding HF modules with the same names, for example BertLMHeadModel
. Is this because you needed to add modifications to the models to get better performance?