support for siglip 2 models
Model description
Hi, many thanks for your great work bringing infinity-emb into life which solved a ton of problems I had plus a lot of time! However, I tried to switch ftom siglip to siglip 2 models since they outperform the old branch, but inifinity-emb crashes and complains about a missing architecture field in config.json apparently:
jo@aibox:~/rf$ docker compose logs infinity-emb
infinity-emb-1 | INFO: Started server process [1]
infinity-emb-1 | INFO: Waiting for application startup.
infinity-emb-1 | INFO 2025-05-05 15:11:06,732 infinity_emb INFO: infinity_server.py:84
infinity-emb-1 | Creating 1engines:
infinity-emb-1 | engines=['google/siglip2-large-patch16-384']
infinity-emb-1 | INFO 2025-05-05 15:11:06,735 infinity_emb INFO: Anonymized telemetry.py:30
infinity-emb-1 | telemetry can be disabled via environment variable
infinity-emb-1 | DO_NOT_TRACK=1.
infinity-emb-1 | INFO 2025-05-05 15:11:06,740 infinity_emb INFO: select_model.py:64
infinity-emb-1 | model=google/siglip2-large-patch16-384 selected,
infinity-emb-1 | using engine=torch and device=cuda
infinity-emb-1 | ERROR: Traceback (most recent call last):
infinity-emb-1 | File "/app/.venv/lib/python3.10/site-packages/starlette/routing.py", line 693, in lifespan
infinity-emb-1 | async with self.lifespan_context(app) as maybe_state:
infinity-emb-1 | File "/usr/lib/python3.10/contextlib.py", line 199, in aenter
infinity-emb-1 | return await anext(self.gen)
infinity-emb-1 | File "/app/infinity_emb/infinity_server.py", line 88, in lifespan
infinity-emb-1 | app.engine_array = AsyncEngineArray.from_args(engine_args_list) # type: ignore
infinity-emb-1 | File "/app/infinity_emb/engine.py", line 306, in from_args
infinity-emb-1 | return cls(engines=tuple(engines))
infinity-emb-1 | File "/app/infinity_emb/engine.py", line 71, in from_args
infinity-emb-1 | engine = cls(**engine_args.to_dict(), _show_deprecation_warning=False)
infinity-emb-1 | File "/app/infinity_emb/engine.py", line 56, in init
infinity-emb-1 | self._model_replicas, self._min_inference_t, self._max_inference_t = select_model(
infinity-emb-1 | File "/app/infinity_emb/inference/select_model.py", line 81, in select_model
infinity-emb-1 | loaded_engine = unloaded_engine.value(engine_args=engine_args_copy)
infinity-emb-1 | File "/app/infinity_emb/transformer/vision/torch_vision.py", line 47, in init
infinity-emb-1 | self.is_colipali = config.architectures[0] in IMAGE_COL_MODELS
infinity-emb-1 | TypeError: 'NoneType' object is not subscriptable
infinity-emb-1 |
infinity-emb-1 | ERROR: Application startup failed. Exiting.
Any chance to add model compatibilty here?
something else I noted, not related to this, is that if I use --url-prefix /v1 in my v2 command, it also adds the prefix to rerank. Openai compatibile endpoints are /rerank and /v1/embeddings, so I need to run 2 instances of infinity-emb - which works fine and this is just meant as a note, maybe you want to 'fix' this in a future release
Kind regards, Josh
Open source status & huggingface transformers.
- [x] The model implementation is available on transformers
- [x] The model weights are available on huggingface-hub
- [x] I verified that the model is currently not running in the latest version
pip install infinity_emb[all] --upgrade - [ ] I made the authors of the model aware that I want to use it with infinity_emb & check if they are aware of the issue.
Hi,
might be the missing architecture field, I added a PR to add it. Can you try with --revision refs/pr/3?
infinity_emb v2 --model-id google/siglip2-large-patch16-384/ --revision refs/pr/3
Many thanks for the fast respone! I tried adding the architecture, didn't help. neither did --revision refs/pr/3
infinity-emb-1 | INFO 2025-05-05 17:31:44,465 infinity_emb INFO: select_model.py:64
infinity-emb-1 | model=google/siglip2-large-patch16-384 selected,
infinity-emb-1 | using engine=torch and device=cuda
infinity-emb-1 | The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
infinity-emb-1 | The tokenizer class you load from this checkpoint is 'GemmaTokenizer'.
infinity-emb-1 | The class this function is called from is 'SiglipTokenizer'.
infinity-emb-1 | ERROR: Traceback (most recent call last):
infinity-emb-1 | File "/app/.venv/lib/python3.10/site-packages/starlette/routing.py", line 693, in lifespan
infinity-emb-1 | async with self.lifespan_context(app) as maybe_state:
infinity-emb-1 | File "/usr/lib/python3.10/contextlib.py", line 199, in aenter
infinity-emb-1 | return await anext(self.gen)
infinity-emb-1 | File "/app/infinity_emb/infinity_server.py", line 88, in lifespan
infinity-emb-1 | app.engine_array = AsyncEngineArray.from_args(engine_args_list) # type: ignore
infinity-emb-1 | File "/app/infinity_emb/engine.py", line 306, in from_args
infinity-emb-1 | return cls(engines=tuple(engines))
infinity-emb-1 | File "/app/infinity_emb/engine.py", line 71, in from_args
infinity-emb-1 | engine = cls(**engine_args.to_dict(), _show_deprecation_warning=False)
infinity-emb-1 | File "/app/infinity_emb/engine.py", line 56, in init
infinity-emb-1 | self._model_replicas, self._min_inference_t, self._max_inference_t = select_model(
infinity-emb-1 | File "/app/infinity_emb/inference/select_model.py", line 81, in select_model
infinity-emb-1 | loaded_engine = unloaded_engine.value(engine_args=engine_args_copy)
infinity-emb-1 | File "/app/infinity_emb/transformer/vision/torch_vision.py", line 95, in init
infinity-emb-1 | self.processor = AutoProcessor.from_pretrained(
infinity-emb-1 | File "/app/.venv/lib/python3.10/site-packages/transformers/models/auto/processing_auto.py", line 333, in from_pretrained
infinity-emb-1 | return processor_class.from_pretrained(
infinity-emb-1 | File "/app/.venv/lib/python3.10/site-packages/transformers/processing_utils.py", line 1035, in from_pretrained
infinity-emb-1 | args = cls._get_arguments_from_pretrained(pretrained_model_name_or_path, **kwargs)
infinity-emb-1 | File "/app/.venv/lib/python3.10/site-packages/transformers/processing_utils.py", line 1081, in _get_arguments_from_pretrained
infinity-emb-1 | args.append(attribute_class.from_pretrained(pretrained_model_name_or_path, **kwargs))
infinity-emb-1 | File "/app/.venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2044, in from_pretrained
infinity-emb-1 | return cls._from_pretrained(
infinity-emb-1 | File "/app/.venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2284, in _from_pretrained
infinity-emb-1 | tokenizer = cls(*init_inputs, **init_kwargs)
infinity-emb-1 | File "/app/.venv/lib/python3.10/site-packages/transformers/models/siglip/tokenization_siglip.py", line 123, in init
infinity-emb-1 | self.sp_model = self.get_spm_processor()
infinity-emb-1 | File "/app/.venv/lib/python3.10/site-packages/transformers/models/siglip/tokenization_siglip.py", line 139, in get_spm_processor
infinity-emb-1 | with open(self.vocab_file, "rb") as f:
infinity-emb-1 | TypeError: expected str, bytes or os.PathLike object, not NoneType
infinity-emb-1 |
infinity-emb-1 | ERROR: Application startup failed. Exiting.
One step closer. Can you update the package transformers==4.51.3? That should do the trick.
Thanks but no luck here since:
~/rf/infinity/libs/infinity_emb$ poetry add [email protected] Creating virtualenv infinity-emb in /home/jo/rf/infinity/libs/infinity_emb/.venv
Updating dependencies Resolving dependencies... (1.7s)
Because optimum[onnxruntime] (1.24.0) depends on transformers (>=4.36,<4.49.0) and no versions of optimum match >1.24.0, optimum[onnxruntime] (>=1.24.0) requires transformers (>=4.36,<4.49.0). So, because infinity-emb depends on both optimum[onnxruntime] (>=1.24.0) and transformers (4.51.3), version solving failed.
Try to go around poetrys version checking by using pip: 'poetry run pip3 install transformers==4.51.3'
OK, this was quite a journey but it's now rock stable working with transformers 4.51.3 in a docker deployment with the new google/SIGlip2 models and they perform awesome!
I created 4 diff files which might help and save some time for the update to transformers 4.51.3 in the official build, which i can share (how?)
the diffs are for:
torch.py
acceleration.py
select_model.py
pyproject.toml
On top of the transformers upgrade, I also added flash_attention_2 into my copy of the project:
flash-attn = {version = ">=2.0.0", optional=true}
diffs attached here: https://github.com/michaelfeil/infinity/discussions/582
Since this has been open for a while and my modifications still work perfectly at least for me...is there any plans to commit this into main?