candle
candle copied to clipboard
Siglip2 model support request
Would consider support Siglip2 model as well?
It seems that the fixed size models are compatible with the initial version so I've added a couple v2 variant in the siglip example in #2800, hopefully should show how to use these.
yes, however, for naflex variants there are some differences.
- it uses 2 dimension inputs, flatten RGB channel into a fixed 16x16x3=768 dimension and as a "patch block";
- it uses interpolate with a max_patches number to support various input.
naflex version performance better imo so it might add support, many users are using it to replace siglip1 now
I would like write a PR to support it, but didn't really know where to start, My new model essentially a tiny VLM trained with Siglip2-naflex and qwen2.5-0.5b. Is there any model I can references to modify for supporting? (i think moondream is quite simillar but they uses siglip1 as vision encoder.) (The tiny VLM that I trained using siglip2 is truly excellent by the way, it is much better in terms of OCR and understanding compared to other small VLMs at present.)