juney-nvidia comments

Results 117 comments of


                                            juney-nvidia

Unable to run Deepseek R1 on blackwell

@pankajroark Hi, have you tried with the latest main branch or follow [this](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/deepseek_v3#running-the-benchmark) guide to see whether the issue still exist? June

Unable to run Deepseek R1 on blackwell

> Yes, in fact, these assertions are unnecessary. I will file a PR soon to fix it. Thanks, Chang! June

feat:enable kvcache to be reused during request generation

@WeiHaocheng @dc3671 @thorjohnsen Hi Fred/Zhenhuan/Thor Can you help review this PR from the community? Thanks June

[DRAFT] Introducing multi-vocab token sampling for audio generation

/bot run

chore: better quantization calibration loop for modelopt

@michaelfeil Thanks for submitting the MR. TRT-LLM has just become github firstly to make it easier for the community engagement. Can you help rebase your MR based on the latest...

feat: deepseek_v1 gqa and correct normalization mode

@akhoroshev Hi, we plan to deprecate DS V1/V2 support, with only keeping the V3/R1 model support. So we may not accept this MR for now. Thanks June

How to implement attention when query and value have different hidden dims?

@ming-wei Hi Ming, do you have any suggestion for this question? Thanks June

feat: Open source fp8_blockscale_gemm

> QQ any benchmark compared with DeepGEMM on Hopper and Blackwell? Thanks. DeepGEMM only support Hopper WGMMA now, and on Blackwell we cannot directly use it. June

feat: Open source fp8_blockscale_gemm

@nv-guomingz @tongyuantongyu to help review. cc @jiahanc for vis on this Hopper related effort.

feat: Open source fp8_blockscale_gemm

> > DeepGEMM only support Hopper WGMMA now, and on Blackwell we cannot directly use it. > > Hi @juney-nvidia Thanks for the reply. Do you recommend to use DeepGEMM...