OlivierDehaene comments

Results 119 comments of


OlivierDehaene

Support for SFR-Embedding Mistral

What's your usecase for these models? Their throughput is so low and the costs so prohibitive that I don't see any.

Support for SFR-Embedding Mistral

I can already tell you that inference times will be multiple orders of magnitude worse.

Support for SFR-Embedding Mistral

OK I'm coming back to this issue and I will add it soon. @prasannakrish97 please don't mention other OSS projects here that's bad etiquette.

Support for Optimum Inference?

Onnx-cpu is not supported and is generally faster than TEI (Candle). We are working in Candle to make CPU faster but it's more complicated than for other devices (like Cuda...

Support for Optimum Inference?

This statement is based on the benchmarks linked at the top of the readme and internal benchmarks done by our partners. If you want to replicate them, use `k6` with...

Support for Optimum Inference?

Yes optimum intel (and generally CPU inference) might be faster with other methods. The main focus of this repo is GPU inference for bulk embeddings.

Are quantized models supported yet?

> I was curious if it is currently possible to point the embedding server to a "*.gguf" file No that's not possible to do yet. It was removed from the...

Are quantized models supported yet?

But overall that's for sure something that we want to add. I just want to make sure we understand the advantages and drawbacks beforehand.

Make B a generic type of Model

What is the typing error?

Support TEI on AMD GPUs

Indeed it would be interesting to support more backends. Adding support for AMD GPUs will happen in Candle tough, not in TEI directly.