OlivierDehaene comments

Results 119 comments of


OlivierDehaene

GPT Neox rotary embedding does not work with padding left

It is possible that this is not the root cause but there is an issue with these lines: ```python offset = 0 if has_layer_past: offset = layer_past[0].shape[-2] seq_len += offset...

GPT Neox rotary embedding does not work with padding left

We use padding left extensively on the serving side as we have a dynamic batching logic that batches sequence of very different lengths together. While the pad==256 example above seems...

GPT Neox rotary embedding does not work with padding left

Hi @njhill! Nice thanks for working on this! For now I have a fix on my text-generation-inference fork as we have multiple neox in prod and I need a fix...

Slow inference time running on CPU

Do you happen to have a AMD CPU?

Slow inference time running on CPU

@LLukas22 can you share more on this? The bert CPU impl is almost exactly the same as the one in Candle Transformers. This might be only linked to the default...

Slow inference time running on CPU

Oh that's expected given your gist: TEI does not batch on CPU (yet). That's a different issue alltogether. Here the main problem is that MKL's sgemm is slower than whatever...

Slow inference time running on CPU

#35 helps on AMD CPUs (20% faster on average) but it shouldn't really make a diff on Intel ones besides making it clear to MKL that we want to use...

Support Colbert v2

What do you need exactly for it to be supported? Is supporting the embeddings per token with compression enough?

Multi-lora support

This is definitely on our roadmap and will be tackled in the coming weeks. Here are the priorities right now: 1. re-write the scheduling code and cache multi-turn conversations. This...

TraceID listed in server logs cannot be found in Grafana Tempo

It's possible that you are just missing the http:// header: `OTLP_ENDPOINT: http://tempo.monitoring:4317`. The traces you see are from the Python server, but it doesn't seem to collect the traces from...