Request for SigLIP2 Training Code and Pretrained Weights
Hi there,
I noticed that SigLIP2 was introduced in release 2.31. I was wondering if there are any plans to include the training code for this model as well. Given the similarities, it seems like CoCa could be a good reference point, as mentioned in Hugging Face's explanation.
Additionally, does this mean that the pretrained weights for the AR decoder and the EMA image encoder will also be released?
Looking forward to your response—thanks!
@eitamarSaraf there were no official weights for those parts of the model released by google.
The full implementation of what's described in siglip2 is a fair bit of work, yes what was done for CoCa could definitely be leveraged (though LocCa is a bigger undertaking than CoCa), but the coordination and compute, not to mention additional dataset work to achieve good results for this would be significant. So I doubt it will be done but who knows.
To this point there have been many contributions to this code base that could not be merged because they did not take into consideration regressions to existing functionality, or the long term maintanability of the project. They simply focused on the feature being added.
@rwightman Maybe we could start small, like adding just SILC or TIPS? I can give it a shot—do you think that would be a useful contribution?
I am super excited to use SigLip2 after reading the paper! I would love to help out @eitamarSaraf or review your contribution 😄 when you say "I noticed that SigLIP2 was introduced in release 2.31" what are you referring to?