infinity
infinity copied to clipboard
Infinity is a high-throughput, low-latency REST API for serving vector embeddings, supporting a wide range of text-embedding models and frameworks.
### Feature request [Kserve](https://github.com/kserve/kserve) is a Kubernetes based engine for predictive and generative AI models and provides abstraction for popular model servers like Huggingface TEI (https://github.com/kserve/kserve/pull/3743), Tensorflow,PyTorch etc. Request to...
### Feature request image embeddings, and audio embeddings have currently insufficent coverage in the openai api server. ### Motivation It would be great to have a test testing this end-to-end...
### System Info infinity_emb v2 --model_id /home/xxxx/peg_onnx --served-model-name embedding --engine optimum --device tensorrt --batch-size 32 OS: linux model_base PEG nvidia-smi: cuda version 11.8, tensorrt: 8.6.1 ### Information - [ ]...
### Model description When I loaded the embedding model and tested the request, it returned a 404 status code. Is this because Infinity does not support requests in the form...
### Feature request prepend v1 to OpenAI compatible APIs ### Motivation This allows us to integrate infinity the same way as other openai compatible API engines into KubeAI: https://github.com/substratusai/kubeai PR:...
### Feature request --dtype support for rerankers ### Motivation Easily quantize cross encoder models ### Your contribution . (will look into this)
### System Info py3.10 infinity-emb 0.0.55 Running with optimum engine fails: ``` INFO 2024-09-13 15:17:02,874 datasets INFO: PyTorch version 2.4.0 available. config.py:59 INFO: Started server process [76741] INFO: Waiting for...
### System Info py3.10 infinity-emb 0.0.55 ``` INFO 2024-09-13 15:19:59,927 datasets INFO: PyTorch version 2.4.0 available. config.py:59 INFO: Started server process [76898] INFO: Waiting for application startup. INFO 2024-09-13 15:20:01,042...
### Model description I used `michaelf34/infinity:0.0.55` to deploy mixed_bread_large reranker. The container is up and I am well capable of pinging the model using python requests, but it is a...
### Model description Hi dear: Thanks for your source code. can support for colbertv2.0 deployment ? Thank you! ### Open source status - [ ] The model implementation is available...