big_vision icon indicating copy to clipboard operation
big_vision copied to clipboard

Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.

Results 66 big_vision issues
Sort by recently updated
recently updated
newest added

Hi, I found it non-trivial to set up a vanilla python environment to work with `big_vision`. Would including a pyproject.toml in the root be valuable for the project? If so...

Hello,I want to get the token of siglip like the "cls token in CLIP", Did siglip have such token which can be used to represent the main feature of the...

https://github.com/google-research/big_vision/blob/main/big_vision/configs/proj/paligemma/finetune_paligemma.ipynb Can some please help ?

``` def resample_patchemb(old, new_hw): """Resample the weights of the patch embedding kernel to target resolution. We resample the patch embedding kernel by approximately inverting the effect of patch resizing. Colab...

Thank you for your great work! Could you provide the pretraing code for PaliGemma 2 series which uses TPU? It would be nice if we could train a model from...

Hi, Just wondering if the languages this model supports are documented anywhere? I see two papers, the SigLIP paper and Pali https://arxiv.org/pdf/2303.15343 https://arxiv.org/abs/2209.06794 I can find reference to 109 languages...

Hello: I want to use Paligemma to segment water in satellite images. However, I haven't been able to find any documentation on how to scale the points inside my mask...

When running this snippet from [HuggingFace](https://huggingface.co/google/paligemma-3b-pt-224) ``` from transformers import AutoProcessor, PaliGemmaForConditionalGeneration from PIL import Image import requests import torch model_id = "google/paligemma-3b-mix-224" device = "cuda:0" dtype = torch.bfloat16 url...

## Summary We evaluated SigLIP 2 models (`siglip2-base-patch16-224` and `siglip2-giant-opt-patch16-384`) for text-based person re-identification (ReID) on standard benchmarks including **Market-1501** and **RSTPReid**. While image-to-image retrieval works reasonably well, **text-to-image retrieval...