infinity
infinity copied to clipboard
Infinity is a high-throughput, low-latency REST API for serving vector embeddings, supporting a wide range of text-embedding models and frameworks.
### System Info os: linux hardware: gpu version: 0.0.75 ### Information - [x] Docker + cli - [ ] pip + cli - [ ] pip + usage of Python...
### System Info docker docker.io/michaelf34/infinity:0.0.75 not set gpu Model: ```jinaai/jina-colbert-v2``` ### Information - [ ] Docker + cli - [ ] pip + cli - [ ] pip + usage...
### Feature request The infinity Dockerfile is carefully constructed to not drag its build artifacts into the final image, but unfortunately the base image `rocm/pytorch:rocm6.2.3_ubuntu22.04_py3.10_pytorch_release_2.3.0` is not, and it is...
### System Info INFO 2025-03-13 13:02:38,529 datasets INFO: PyTorch version 2.4.1 available. config.py:54 Usage: infinity_emb v2 [OPTIONS] Infinity API ♾️ cli v2. MIT License. Copyright (c) 2023-now Michael Feil Multiple...
### Feature request It would be nice to have late chunking supported by the library, optionally activated by a parameter passed in the request. This feature is available, for example,...
### Feature request infinity version: 0.0.75 I noticed that when a GPU OOM occurs, the service hangs and new requests cannot be executed. Could you provide a mechanism for the...
### System Info - Infinity latest CPU Image. ### Information - [x] Docker + cli - [ ] pip + cli - [ ] pip + usage of Python interface...
### Feature request As an embedding service, in scenarios with high QPS (queries per second) and sensitivity to latency, if there are multiple requests piled up in Infinity's request queue...
Currently the endpoint for embedding models is /embeddings on the infinity server, but I believe the correct open AI format would be /v1/embeddings ? correct me if Im wrong.
### System Info I am using 0.0.75 and when I use `--no-model-warmup` it still tries to warm up the model. Therefor CTRL+C for exiting also does not work when warming...