LLaVA [Questio why 'mm_vision_select_layer' == -2 in config ? n]

[Questio why 'mm_vision_select_layer' == -2 in config ? n]

Open fmy7834 opened this issue 7 months ago • 2 comments

Question

In training scripts, 'mm_vision_select_layer' is set to be -2, which means the penultimate layer's output of CLIP vision encoder is used as image features. I wonder why not use the last layer's output?

Jul 17 '24 09:07 fmy7834

LLaVA LLaVA copied to clipboard

[Questio why 'mm_vision_select_layer' == -2 in config ? n]

Question

LLaVA
LLaVA copied to clipboard