Warmup fails for embedding model
System Info
Command: docker compose up
OS Version: linux, ubuntu
Model: intfloat/multilingual-e5-large-instruct
docker compose file:
services:
infinity:
image: michaelf34/infinity:latest-cpu
command:
- v2
- --engine
- optimum
- --model-id
- intfloat/multilingual-e5-large-instruct
- --served-model-name
- keyword-classifier
- --port
- "5000"
ports:
- "5000:5000"
docker image digest: 2a464dcc06e6
Information
- [x] Docker + cli
- [ ] pip + cli
- [ ] pip + usage of Python interface
Tasks
- [ ] An officially supported CLI command
- [ ] My own modifications
Reproduction
Steps to reproduce:
- Start docker compose with the example file
Expected behaviour:
- Server starts and warmup is executed
Actucal behaviour: An error occurs: it seems like a too long input is sent to the model for warmup. It seems like that the input is not truncated. A possible fix could be to add truncation, or to make sure that the input is not too long.
The error:
infinity-1 | INFO 2025-04-29 08:24:13,721 infinity_emb INFO: Getting select_model.py:97
infinity-1 | timings for batch_size=32 and avg tokens per
infinity-1 | sentence=3
infinity-1 | 4.91 ms tokenization
infinity-1 | 97.51 ms inference
infinity-1 | 0.16 ms post-processing
infinity-1 | 102.59 ms total
infinity-1 | embeddings/sec: 311.93
infinity-1 | 2025-04-29 08:24:13.751459785 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running Gather node. Name:'/0/auto_model/embeddings/position_embeddings/Gather' Status Message: indices element out of data bounds, idx=514 must be within the inclusive range [-514,513]
infinity-1 | ERROR: Traceback (most recent call last):
infinity-1 | File "/app/.venv/lib/python3.10/site-packages/starlette/routing.py", line 693, in lifespan
infinity-1 | async with self.lifespan_context(app) as maybe_state:
infinity-1 | File "/usr/lib/python3.10/contextlib.py", line 199, in aenter
infinity-1 | return await anext(self.gen)
infinity-1 | File "/app/infinity_emb/infinity_server.py", line 88, in lifespan
infinity-1 | app.engine_array = AsyncEngineArray.from_args(engine_args_list) # type: ignore
infinity-1 | File "/app/infinity_emb/engine.py", line 306, in from_args
infinity-1 | return cls(engines=tuple(engines))
infinity-1 | File "/app/infinity_emb/engine.py", line 71, in from_args
infinity-1 | engine = cls(**engine_args.to_dict(), _show_deprecation_warning=False)
infinity-1 | File "/app/infinity_emb/engine.py", line 56, in init
infinity-1 | self._model_replicas, self._min_inference_t, self._max_inference_t = select_model(
infinity-1 | File "/app/infinity_emb/inference/select_model.py", line 99, in select_model
infinity-1 | loaded_engine.warmup(batch_size=engine_args.batch_size, n_tokens=512)
infinity-1 | File "/app/infinity_emb/transformer/abstract.py", line 93, in warmup
infinity-1 | return run_warmup(self, inp)
infinity-1 | File "/app/infinity_emb/transformer/abstract.py", line 233, in run_warmup
infinity-1 | embed = model.encode_core(feat)
infinity-1 | File "/app/infinity_emb/transformer/embedder/optimum.py", line 91, in encode_core
infinity-1 | outputs = self.model(**onnx_input)
infinity-1 | File "/app/.venv/lib/python3.10/site-packages/optimum/modeling_base.py", line 98, in call
infinity-1 | return self.forward(*args, **kwargs)
infinity-1 | File "/app/.venv/lib/python3.10/site-packages/optimum/onnxruntime/modeling_ort.py", line 1109, in forward
infinity-1 | onnx_outputs = self.model.run(None, onnx_inputs)
infinity-1 | File "/app/.venv/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 266, in run
infinity-1 | return self._sess.run(output_names, input_feed, run_options)
infinity-1 | onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Non-zero status code returned while running Gather node. Name:'/0/auto_model/embeddings/position_embeddings/Gather' Status Message: indices element out of data bounds, idx=514 must be within the inclusive range [-514,513]
infinity-1 |
infinity-1 | ERROR: Application startup failed. Exiting.
infinity-1 exited with code 3
Hi @molntamas ,
thanks for the detailed report. You can disable the warmup using the --no-model-warmup flag.