infinity issues

How does this compare to Huggingface's Text Embedding Inference?

12

Hi, Thank you for your amazing work! We'd like to add an embedding template for users to deploy on RunPod, and we're deciding between Infinity and [HF's Text Embedding Inference](https://github.com/huggingface/text-embeddings-inference/tree/main)....

alpayariyak

AMD ROCm docker images support (+ optimization)

10

I am planning to evaluate hardware agnostic options - [ ] adapt poetry setup for optional deps - [ ] build a Docker Image for AMD Mi250/300 - [ ]...

michaelfeil

help wanted

Move `.detach().cpu()` into `encode_core`, and option to use cuda streams

5

In `encode_post` of `SentenceTransformerPatched` we have ``` embeddings = out_features.detach().cpu().to(torch.float32) ``` On GPU if I'm understanding correctly, `.cpu()` triggers a device to host synchronization which will have to wait for...

jobright-jiyuan

help wanted

Add a TextSplitter in LangChain to share the model of the embedding model

### Feature request Have you ever though to add an API endpoint that can serve as well as TextSplitter ? It would replace the need to load in memory the...

Jimmy-Newtron

Multimodel inference

4

Will be great if we could load more models in the same container and switch between them using model name

bacoco

enhancement

Multi-Modal Inference / Clip

1

Add support for MultiModal Inference / Clip

michaelfeil

float16 and other optimizations help?

6

I built the following script based on reviewing yours...thanks BTW. It only implements float16 but I'd love to incorporate other quantizations and/or better transformer or whatever else. Hope what I...

BBC-Esq

Love the repo! Wish I could help!

3

BBC-Esq

Dynamic loading - different models at request time / multiple models

4

Instead of running an instance per model in the dockerfile. Can a list of models be provided at instantiation and then the model is chosen via the api request. The...

cduk

Question: Support for sparse embeddings?

11

Hi, I was wondering whether is would make sence to support models which, in addition to dense vectors, also support sparse and colbert. For example, [BGE-M3](https://huggingface.co/BAAI/bge-m3) works well under infinity...

Matheus-Garbelini

question

new model

infinity
infinity copied to clipboard

Metadata

How does this compare to Huggingface's Text Embedding Inference?

AMD ROCm docker images support (+ optimization)

Move `.detach().cpu()` into `encode_core`, and option to use cuda streams

Add a TextSplitter in LangChain to share the model of the embedding model

Multimodel inference

Multi-Modal Inference / Clip

float16 and other optimizations help?

Love the repo! Wish I could help!

Dynamic loading - different models at request time / multiple models

Question: Support for sparse embeddings?

← Metadata

Owner

Metadata

infinity infinity copied to clipboard

Metadata

← Metadata

Owner

Metadata

infinity
infinity copied to clipboard