juney-nvidia comments

Results 117 comments of


                                            juney-nvidia

feat: Add LoRA support for gemma

@byshiue pls help review this LoRA related MR. Also keeping @brb-nv for vis on this.

CUDA Device Binding Runtime Error When Running GPT-3 in Multi-Node Mode Using Slurm

@glara76 Hi Glara76, We are encouraging users to use the LLM API for inference purpose which provide a more convenient user-interface for the end-users. The documentation will be refined to...

CUDA Device Binding Runtime Error When Running GPT-3 in Multi-Node Mode Using Slurm

Got it. Can you try to make your multi-node run flow the same as [here](https://nvidia.github.io/TensorRT-LLM/architecture/core-concepts.html#llama-3-1-405b) to help nail down the potential issue firstly? @jinyangyuan-nvidia Hi Jinyang, when you are convenient,...

feat: Optionally split MoE inputs into chunks to reduce GPU memory usage

@jinyangyuan-nvidia Thanks for adding this feature, Jinyang. I noticed that currently this MR only apple the "optionally split MoE inputs into chunks" for DS R1 model. How much additional efforts...

feat: Optionally split MoE inputs into chunks to reduce GPU memory usage

> Thanks June. This feature can be easily applied to other MoE models by refactoring the code. I will improve this PR accordingly. Thanks, Jinyang! June

[Question] Modifying the Batch Scheduling Policy in the trtllm-bench CLI

@FrankD412 @kaiyux @jiahanc Hi Frank/Kaiyu/Cyrus, can you help confirm the question from the community? Thanks June

InternLM2 encounters a error when the batch size exceeds 16

@nv-guomingz Hi Guoming, can you help look into this issue firstly? Thanks June