infinity
infinity copied to clipboard
Infinity is a high-throughput, low-latency REST API for serving vector embeddings, supporting a wide range of text-embedding models and frameworks.
Hi, Thank you for your amazing work! We'd like to add an embedding template for users to deploy on RunPod, and we're deciding between Infinity and [HF's Text Embedding Inference](https://github.com/huggingface/text-embeddings-inference/tree/main)....
I am planning to evaluate hardware agnostic options - [ ] adapt poetry setup for optional deps - [ ] build a Docker Image for AMD Mi250/300 - [ ]...
In `encode_post` of `SentenceTransformerPatched` we have ``` embeddings = out_features.detach().cpu().to(torch.float32) ``` On GPU if I'm understanding correctly, `.cpu()` triggers a device to host synchronization which will have to wait for...
### Feature request Have you ever though to add an API endpoint that can serve as well as TextSplitter ? It would replace the need to load in memory the...
Will be great if we could load more models in the same container and switch between them using model name
Add support for MultiModal Inference / Clip
I built the following script based on reviewing yours...thanks BTW. It only implements float16 but I'd love to incorporate other quantizations and/or better transformer or whatever else. Hope what I...
Instead of running an instance per model in the dockerfile. Can a list of models be provided at instantiation and then the model is chosen via the api request. The...
Hi, I was wondering whether is would make sence to support models which, in addition to dense vectors, also support sparse and colbert. For example, [BGE-M3](https://huggingface.co/BAAI/bge-m3) works well under infinity...