text-embeddings-inference
text-embeddings-inference copied to clipboard
A blazing fast inference solution for text embeddings models
### Feature request The OpenAI API `/embedding` endpoint accepts input for both text (list of strings) and [tokenized input](https://github.com/openai/openai-openapi/blob/893ba52242dbd5387a97b96444ee1c742cfce9bd/openapi.yaml#L8832-L8850) (list of integers). text-embeddings-inference should also support list of integers (tokens)...
### System Info Version: v.1.4.0 Cargo version: cargo 1.79.0 (ffa9cf99a 2024-06-03) GCC version: 11.4.1 GPU: Compile with CUDA_COMPUTE_CAP=86 on machine without GPU (but with CUDA 12.1). I plan to use...
# What does this PR do? - **Change**: Moves the functions `batch`, `sort_embeddings` `backends/candle/tests/` to `backends/candle`. - **Motivation**: Crates consuming `text-embeddings-inference` as a dependency (and not as a standalone server)...
### System Info Sample Docker Compose File ``` embedding: image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.0 platform: linux/amd64 volumes: - embed_data:/data command: --model-id BAAI/bge-small-en-v1.5 ports: - "8080:80" ``` When hitting endpoint `/embed` over and over...
### System Info Hi, on inference endpoint in huggingface the TGI for classifiers is working but here it doesn't, Deberta v3 classifier is not supported? ### Information - [ ]...
### System Info text-embedding-inference:1.3.0 ### Information - [X] Docker - [ ] The CLI directly ### Tasks - [ ] An officially supported command - [ ] My own modifications...
### System Info I am currently mostly working with the ghcr.io/huggingface/text-embeddings-inference:cpu-1.2 Docker image on MacOS. Currently, I am only trying to find out which reranker models with a context size...
### System Info Image: v1.2 CPU Model used: jinaai/jina-embeddings-v2-base-de Deployment: Docker / RH OpenShift ### Information - [X] Docker - [ ] The CLI directly ### Tasks - [X] An...
### Feature request Support BAAI/bge-reranker-v2-minicpm-layerwise ### Motivation BAAI/bge-reranker-v2-minicpm-layerwise inference is very slow using default way. ### Your contribution None