OlivierDehaene

Results 119 comments of OlivierDehaene

Ok then I'm not sure there is a lot that can be done here besides adding some documentation to explain this issue in the README/docs.

Setting these values correctly would be really hard since they are MKL/runtime specific. Plus these should be set before execution so this imply creating a launching script above the TEI...

Just to chime in the discussion: we developped at Hugging Face a specific backend to handle dynamic batching for the LLMs hosted on our platform: [text-generation-inference](https://github.com/huggingface/text-generation-inference). This backend works by...

Yes this is very close to what I had in mind! If you are ok with this I will continue your work in a fork of your fork tomorrow and...

[This is roughly what I had in mind](https://github.com/huggingface/text-generation-inference/pull/36). There are still some things to iron out. One question I have is regarding the API. I think the only events we...

@yk, I'm done with my implem [here](https://github.com/huggingface/text-generation-inference/pull/36). Does the following SSE event signature cover your usecase? ```rust struct Details { finish_reason: String, generated_tokens: u32, seed: Option, } struct StreamResponse {...

Contrastive search is not supported by text-generation-inference.

Nice! I think that makes a lot more sense that the current naive algorithm and is easier to represent mentally. I need to think about your implementation and maybe play...

> In my experience, 8bit is around 8x slower compared to fp16 Yes, bitsandbytes adds a lot of CPU bottleneck and the kernels are slower than the native ones. It...

In what setting is this called twice? In my oppinion this should crash.