OlivierDehaene

Results 119 comments of OlivierDehaene

What's your usecase for these models? Their throughput is so low and the costs so prohibitive that I don't see any.

I can already tell you that inference times will be multiple orders of magnitude worse.

OK I'm coming back to this issue and I will add it soon. @prasannakrish97 please don't mention other OSS projects here that's bad etiquette.

Onnx-cpu is not supported and is generally faster than TEI (Candle). We are working in Candle to make CPU faster but it's more complicated than for other devices (like Cuda...

This statement is based on the benchmarks linked at the top of the readme and internal benchmarks done by our partners. If you want to replicate them, use `k6` with...

Yes optimum intel (and generally CPU inference) might be faster with other methods. The main focus of this repo is GPU inference for bulk embeddings.

> I was curious if it is currently possible to point the embedding server to a "*.gguf" file No that's not possible to do yet. It was removed from the...

But overall that's for sure something that we want to add. I just want to make sure we understand the advantages and drawbacks beforehand.

What is the typing error?

Indeed it would be interesting to support more backends. Adding support for AMD GPUs will happen in Candle tough, not in TEI directly.