infinity
infinity copied to clipboard
Infinity is a high-throughput, low-latency REST API for serving vector embeddings, supporting a wide range of text-embedding models and frameworks.
### Model description Hi, many thanks for your great work bringing infinity-emb into life which solved a ton of problems I had plus a lot of time! However, I tried...
## Request I am suggesting/requesting a section documenting that disabling vector disk cache might bring significant performance boost, if the throughput is particularly high (if I am correct). ## Context...
### System Info When running the infinity cpu docker image with optimum engine with an onnx image, the memory usage goes up very high temporeraly. For example with the model...
### System Info Command: docker compose up OS Version: linux, ubuntu Model: intfloat/multilingual-e5-large-instruct docker compose file: ``` services: infinity: image: michaelf34/infinity:latest-cpu command: - v2 - --engine - optimum - --model-id...
### Feature request Hello, First of all, thank you for developing infinity, an excellent package dedicated to inference for embedding models. I am opening this issue to request support once...
## Description This is a PR that integrates OpenVINO backend into Infinity's Optimum Embedder class through the use of [optimum-intel](https://github.com/huggingface/optimum-intel/tree/main) library. ## Related Issue If applicable, link the issue this...
Hi, in my setup I am embedding images in bulk (1000 images/request) with 1 T4 and 40 CPUs on Modal. With the normal embedding call embedding 1000 images takes **55s**...
### Model description Hi there :) I have been using the `mixedbread-ai/mxbai-rerank-base-v1` served with Infinity via Runpod for some time now. However, mixedbread has released a v2 version: https://huggingface.co/mixedbread-ai/mxbai-rerank-base-v2, for...
When I run ```bash port=3000 model1=Salesforce/SFR-Embedding-Code-2B_R volume=$PWD/data docker run -it --gpus device=0 \ -v $volume:/app/.cache \ -p $port:$port \ michaelf34/infinity:latest \ v2 \ --model-id $model1 \ --port $port \ --model-warmup...