Zongfei Jing comments

Results 15 comments of


                                            Zongfei Jing

gptManagerBenchmark seems to go into a dead loop with GPU usage 0%

- question 1: you can refer to [this link](https://nvidia.github.io/TensorRT-LLM/advanced/lora.html#lora-module-id-mapping) for the definition of the Lora modules. `attn_qkv` is a combined qkv adapter. - question 2: since GPU cache is set...

Warmup Example of loading LoRa weights

To perform inference with a specific LoRA for the first time, lora_task_id, lora_weights, and lora_config must all be given. The LoRA will be cached, so that subsequent requests for the...

Warmup Example of loading LoRa weights

As for the performance degradation you mentioned, I wonder if the result is correct? How much has the performance decreased? Thanks.

Model Performance Degraded when using BFLOAT16 LoRa Adapters

In the bug description, I did not see which LoRA was used. could you please tell me ? It's better to offer the huggingface link of the base model and...

Model Performance Degraded when using BFLOAT16 LoRa Adapters

Hi, @TheCodeWrangler, Have you solved this issue in the latest version? If not, could you please provide a script to reproduce this issue?

perf: Add optimizations for deepseek in min latency mode

/bot run

perf: Add optimizations for deepseek in min latency mode

/bot run

perf: Add optimizations for deepseek in min latency mode

/bot run --disable-fail-fast

perf: Add optimizations for deepseek in min latency mode

/bot run --disable-fail-fast

perf: Add optimizations for deepseek in min latency mode

/bot run --disable-fail-fast