Pernekhan Utemuratov issues

Results 6 issues of


                                            Pernekhan Utemuratov

When will FP8 be available for Mixtral?

Could you guys share rough timeline on the support of FP8 quantization for Mixtral (MoE) model? cc: @Tracin

triaged

Crashes for long context requests

trtllm crashes when I give long context requests within the `max-input-length` limits. I believe it happens when total pending requests reach the `max-num-tokens` limit. But why it's not queuing requests...

Add example of tensorrt-llm usage

[feature] Add multi-turn conversational function calling category for benchmarking

**Is the feature request related to a problem?** Currently, there are no benchmarking for multi-turn conversations. Sometimes assistant needs to ask for more information before calling the functions. For example:...

enhancement

The new kv cache related metrics are missing: allocTotalBlocks, allocNewBlocks, reusedBlocks

TensorRT-LLM has more stats for kv cache, but the backend doesn't have. Can we add the missing ones to the next week's commits? ``` struct KvCacheStats { SizeType32 maxNumBlocks; SizeType32...

Pernekhan Utemuratov

When will FP8 be available for Mixtral?

Crashes for long context requests

Add example of tensorrt-llm usage

[feature] Add multi-turn conversational function calling category for benchmarking

The new kv cache related metrics are missing: allocTotalBlocks, allocNewBlocks, reusedBlocks

Add missing kv_cache related metrics