infinity support for siglip 2 models

Model description

Hi, many thanks for your great work bringing infinity-emb into life which solved a ton of problems I had plus a lot of time! However, I tried to switch ftom siglip to siglip 2 models since they outperform the old branch, but inifinity-emb crashes and complains about a missing architecture field in config.json apparently:

jo@aibox:~/rf$ docker compose logs infinity-emb infinity-emb-1 | INFO: Started server process [1] infinity-emb-1 | INFO: Waiting for application startup. infinity-emb-1 | INFO 2025-05-05 15:11:06,732 infinity_emb INFO: infinity_server.py:84 infinity-emb-1 | Creating 1engines: infinity-emb-1 | engines=['google/siglip2-large-patch16-384'] infinity-emb-1 | INFO 2025-05-05 15:11:06,735 infinity_emb INFO: Anonymized telemetry.py:30 infinity-emb-1 | telemetry can be disabled via environment variable infinity-emb-1 | DO_NOT_TRACK=1. infinity-emb-1 | INFO 2025-05-05 15:11:06,740 infinity_emb INFO: select_model.py:64 infinity-emb-1 | model=google/siglip2-large-patch16-384 selected, infinity-emb-1 | using engine=torch and device=cuda infinity-emb-1 | ERROR: Traceback (most recent call last): infinity-emb-1 | File "/app/.venv/lib/python3.10/site-packages/starlette/routing.py", line 693, in lifespan infinity-emb-1 | async with self.lifespan_context(app) as maybe_state: infinity-emb-1 | File "/usr/lib/python3.10/contextlib.py", line 199, in aenter infinity-emb-1 | return await anext(self.gen) infinity-emb-1 | File "/app/infinity_emb/infinity_server.py", line 88, in lifespan infinity-emb-1 | app.engine_array = AsyncEngineArray.from_args(engine_args_list) # type: ignore infinity-emb-1 | File "/app/infinity_emb/engine.py", line 306, in from_args infinity-emb-1 | return cls(engines=tuple(engines)) infinity-emb-1 | File "/app/infinity_emb/engine.py", line 71, in from_args infinity-emb-1 | engine = cls(**engine_args.to_dict(), _show_deprecation_warning=False) infinity-emb-1 | File "/app/infinity_emb/engine.py", line 56, in init infinity-emb-1 | self._model_replicas, self._min_inference_t, self._max_inference_t = select_model( infinity-emb-1 | File "/app/infinity_emb/inference/select_model.py", line 81, in select_model infinity-emb-1 | loaded_engine = unloaded_engine.value(engine_args=engine_args_copy) infinity-emb-1 | File "/app/infinity_emb/transformer/vision/torch_vision.py", line 47, in init infinity-emb-1 | self.is_colipali = config.architectures[0] in IMAGE_COL_MODELS infinity-emb-1 | TypeError: 'NoneType' object is not subscriptable infinity-emb-1 | infinity-emb-1 | ERROR: Application startup failed. Exiting.

Any chance to add model compatibilty here?

something else I noted, not related to this, is that if I use --url-prefix /v1 in my v2 command, it also adds the prefix to rerank. Openai compatibile endpoints are /rerank and /v1/embeddings, so I need to run 2 instances of infinity-emb - which works fine and this is just meant as a note, maybe you want to 'fix' this in a future release

Kind regards, Josh

Open source status & huggingface transformers.

[x] The model implementation is available on transformers
[x] The model weights are available on huggingface-hub
[x] I verified that the model is currently not running in the latest version pip install infinity_emb[all] --upgrade
[ ] I made the authors of the model aware that I want to use it with infinity_emb & check if they are aware of the issue.

May 05 '25 15:05 JosefAschauer

Hi,

might be the missing architecture field, I added a PR to add it. Can you try with --revision refs/pr/3?

infinity_emb v2 --model-id google/siglip2-large-patch16-384/ --revision refs/pr/3

May 05 '25 16:05 wirthual

Many thanks for the fast respone! I tried adding the architecture, didn't help. neither did --revision refs/pr/3

infinity-emb-1 | INFO 2025-05-05 17:31:44,465 infinity_emb INFO: select_model.py:64 infinity-emb-1 | model=google/siglip2-large-patch16-384 selected, infinity-emb-1 | using engine=torch and device=cuda infinity-emb-1 | The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. infinity-emb-1 | The tokenizer class you load from this checkpoint is 'GemmaTokenizer'. infinity-emb-1 | The class this function is called from is 'SiglipTokenizer'. infinity-emb-1 | ERROR: Traceback (most recent call last): infinity-emb-1 | File "/app/.venv/lib/python3.10/site-packages/starlette/routing.py", line 693, in lifespan infinity-emb-1 | async with self.lifespan_context(app) as maybe_state: infinity-emb-1 | File "/usr/lib/python3.10/contextlib.py", line 199, in aenter infinity-emb-1 | return await anext(self.gen) infinity-emb-1 | File "/app/infinity_emb/infinity_server.py", line 88, in lifespan infinity-emb-1 | app.engine_array = AsyncEngineArray.from_args(engine_args_list) # type: ignore infinity-emb-1 | File "/app/infinity_emb/engine.py", line 306, in from_args infinity-emb-1 | return cls(engines=tuple(engines)) infinity-emb-1 | File "/app/infinity_emb/engine.py", line 71, in from_args infinity-emb-1 | engine = cls(**engine_args.to_dict(), _show_deprecation_warning=False) infinity-emb-1 | File "/app/infinity_emb/engine.py", line 56, in init infinity-emb-1 | self._model_replicas, self._min_inference_t, self._max_inference_t = select_model( infinity-emb-1 | File "/app/infinity_emb/inference/select_model.py", line 81, in select_model infinity-emb-1 | loaded_engine = unloaded_engine.value(engine_args=engine_args_copy) infinity-emb-1 | File "/app/infinity_emb/transformer/vision/torch_vision.py", line 95, in init infinity-emb-1 | self.processor = AutoProcessor.from_pretrained( infinity-emb-1 | File "/app/.venv/lib/python3.10/site-packages/transformers/models/auto/processing_auto.py", line 333, in from_pretrained infinity-emb-1 | return processor_class.from_pretrained( infinity-emb-1 | File "/app/.venv/lib/python3.10/site-packages/transformers/processing_utils.py", line 1035, in from_pretrained infinity-emb-1 | args = cls._get_arguments_from_pretrained(pretrained_model_name_or_path, **kwargs) infinity-emb-1 | File "/app/.venv/lib/python3.10/site-packages/transformers/processing_utils.py", line 1081, in _get_arguments_from_pretrained infinity-emb-1 | args.append(attribute_class.from_pretrained(pretrained_model_name_or_path, **kwargs)) infinity-emb-1 | File "/app/.venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2044, in from_pretrained infinity-emb-1 | return cls._from_pretrained( infinity-emb-1 | File "/app/.venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2284, in _from_pretrained infinity-emb-1 | tokenizer = cls(*init_inputs, **init_kwargs) infinity-emb-1 | File "/app/.venv/lib/python3.10/site-packages/transformers/models/siglip/tokenization_siglip.py", line 123, in init infinity-emb-1 | self.sp_model = self.get_spm_processor() infinity-emb-1 | File "/app/.venv/lib/python3.10/site-packages/transformers/models/siglip/tokenization_siglip.py", line 139, in get_spm_processor infinity-emb-1 | with open(self.vocab_file, "rb") as f: infinity-emb-1 | TypeError: expected str, bytes or os.PathLike object, not NoneType infinity-emb-1 | infinity-emb-1 | ERROR: Application startup failed. Exiting.

May 05 '25 17:05 JosefAschauer

One step closer. Can you update the package transformers==4.51.3? That should do the trick.

May 06 '25 04:05 wirthual

Thanks but no luck here since:

~/rf/infinity/libs/infinity_emb$ poetry add [email protected] Creating virtualenv infinity-emb in /home/jo/rf/infinity/libs/infinity_emb/.venv

Updating dependencies Resolving dependencies... (1.7s)

Because optimum[onnxruntime] (1.24.0) depends on transformers (>=4.36,<4.49.0) and no versions of optimum match >1.24.0, optimum[onnxruntime] (>=1.24.0) requires transformers (>=4.36,<4.49.0). So, because infinity-emb depends on both optimum[onnxruntime] (>=1.24.0) and transformers (4.51.3), version solving failed.

May 06 '25 06:05 JosefAschauer

Try to go around poetrys version checking by using pip: 'poetry run pip3 install transformers==4.51.3'

May 06 '25 13:05 wirthual

OK, this was quite a journey but it's now rock stable working with transformers 4.51.3 in a docker deployment with the new google/SIGlip2 models and they perform awesome!

I created 4 diff files which might help and save some time for the update to transformers 4.51.3 in the official build, which i can share (how?)

the diffs are for:

torch.py
acceleration.py
select_model.py
pyproject.toml

On top of the transformers upgrade, I also added flash_attention_2 into my copy of the project:
flash-attn = {version = ">=2.0.0", optional=true}

May 06 '25 20:05 JosefAschauer

diffs attached here: https://github.com/michaelfeil/infinity/discussions/582

May 08 '25 13:05 JosefAschauer

Since this has been open for a while and my modifications still work perfectly at least for me...is there any plans to commit this into main?

Jun 08 '25 14:06 JosefAschauer