candle Siglip2 model support request

Siglip2 model support request

Open lucasjinreal opened this issue 8 months ago • 2 comments

Would consider support Siglip2 model as well?

Mar 09 '25 12:03 lucasjinreal

It seems that the fixed size models are compatible with the initial version so I've added a couple v2 variant in the siglip example in #2800, hopefully should show how to use these.

Mar 09 '25 13:03 LaurentMazare

yes, however, for naflex variants there are some differences.

it uses 2 dimension inputs, flatten RGB channel into a fixed 16x16x3=768 dimension and as a "patch block";
it uses interpolate with a max_patches number to support various input.

naflex version performance better imo so it might add support, many users are using it to replace siglip1 now

I would like write a PR to support it, but didn't really know where to start, My new model essentially a tiny VLM trained with Siglip2-naflex and qwen2.5-0.5b. Is there any model I can references to modify for supporting? (i think moondream is quite simillar but they uses siglip1 as vision encoder.) （The tiny VLM that I trained using siglip2 is truly excellent by the way, it is much better in terms of OCR and understanding compared to other small VLMs at present.）

Mar 09 '25 14:03 lucasjinreal

candle candle copied to clipboard

Siglip2 model support request

candle
candle copied to clipboard