infinity icon indicating copy to clipboard operation
infinity copied to clipboard

Infinity is a high-throughput, low-latency REST API for serving vector embeddings, supporting a wide range of text-embedding models and frameworks.

Results 144 infinity issues
Sort by recently updated
recently updated
newest added

Hi, Thank you for your amazing work! We'd like to add an embedding template for users to deploy on RunPod, and we're deciding between Infinity and [HF's Text Embedding Inference](https://github.com/huggingface/text-embeddings-inference/tree/main)....

I am planning to evaluate hardware agnostic options - [ ] adapt poetry setup for optional deps - [ ] build a Docker Image for AMD Mi250/300 - [ ]...

help wanted

In `encode_post` of `SentenceTransformerPatched` we have ``` embeddings = out_features.detach().cpu().to(torch.float32) ``` On GPU if I'm understanding correctly, `.cpu()` triggers a device to host synchronization which will have to wait for...

help wanted

### Feature request Have you ever though to add an API endpoint that can serve as well as TextSplitter ? It would replace the need to load in memory the...

Will be great if we could load more models in the same container and switch between them using model name

enhancement

Add support for MultiModal Inference / Clip

I built the following script based on reviewing yours...thanks BTW. It only implements float16 but I'd love to incorporate other quantizations and/or better transformer or whatever else. Hope what I...

Instead of running an instance per model in the dockerfile. Can a list of models be provided at instantiation and then the model is chosen via the api request. The...

Hi, I was wondering whether is would make sence to support models which, in addition to dense vectors, also support sparse and colbert. For example, [BGE-M3](https://huggingface.co/BAAI/bge-m3) works well under infinity...

question
new model