candle icon indicating copy to clipboard operation
candle copied to clipboard

Siglip2 model support request

Open lucasjinreal opened this issue 8 months ago • 2 comments

Would consider support Siglip2 model as well?

lucasjinreal avatar Mar 09 '25 12:03 lucasjinreal

It seems that the fixed size models are compatible with the initial version so I've added a couple v2 variant in the siglip example in #2800, hopefully should show how to use these.

LaurentMazare avatar Mar 09 '25 13:03 LaurentMazare

yes, however, for naflex variants there are some differences.

  1. it uses 2 dimension inputs, flatten RGB channel into a fixed 16x16x3=768 dimension and as a "patch block";
  2. it uses interpolate with a max_patches number to support various input.

naflex version performance better imo so it might add support, many users are using it to replace siglip1 now

I would like write a PR to support it, but didn't really know where to start, My new model essentially a tiny VLM trained with Siglip2-naflex and qwen2.5-0.5b. Is there any model I can references to modify for supporting? (i think moondream is quite simillar but they uses siglip1 as vision encoder.) (The tiny VLM that I trained using siglip2 is truly excellent by the way, it is much better in terms of OCR and understanding compared to other small VLMs at present.)

lucasjinreal avatar Mar 09 '25 14:03 lucasjinreal