LLaVA icon indicating copy to clipboard operation
LLaVA copied to clipboard

[Questio why 'mm_vision_select_layer' == -2 in config ? n]

Open fmy7834 opened this issue 7 months ago • 2 comments

Question

In training scripts, 'mm_vision_select_layer' is set to be -2, which means the penultimate layer's output of CLIP vision encoder is used as image features. I wonder why not use the last layer's output? image

fmy7834 avatar Jul 17 '24 09:07 fmy7834