juney-nvidia

Results 117 comments of juney-nvidia

@byshiue pls help review this LoRA related MR. Also keeping @brb-nv for vis on this.

@glara76 Hi Glara76, We are encouraging users to use the LLM API for inference purpose which provide a more convenient user-interface for the end-users. The documentation will be refined to...

Got it. Can you try to make your multi-node run flow the same as [here](https://nvidia.github.io/TensorRT-LLM/architecture/core-concepts.html#llama-3-1-405b) to help nail down the potential issue firstly? @jinyangyuan-nvidia Hi Jinyang, when you are convenient,...

@jinyangyuan-nvidia Thanks for adding this feature, Jinyang. I noticed that currently this MR only apple the "optionally split MoE inputs into chunks" for DS R1 model. How much additional efforts...

> Thanks June. This feature can be easily applied to other MoE models by refactoring the code. I will improve this PR accordingly. Thanks, Jinyang! June

@FrankD412 @kaiyux @jiahanc Hi Frank/Kaiyu/Cyrus, can you help confirm the question from the community? Thanks June

@nv-guomingz Hi Guoming, can you help look into this issue firstly? Thanks June