juney-nvidia comments

Results 117 comments of


                                            juney-nvidia

test:Add Eagle tests with untrained heads

Thanks for submitting this PR @brb-nv, I noticed that multiple E2E tests have been added. Do you have any rough estimation about the increased pre-merge time? I am asking this...

test:Add Eagle tests with untrained heads

> > Thanks for submitting this PR @brb-nv, I noticed that multiple E2E tests have been added. Do you have any rough estimation about the increased pre-merge time? I am...

test: Add tests for Ministral, Codestral, Mistral Small

@QiJune @byshiue can you help review this MR from @brb-nv since I know you are now working to reduce the CI test time. Thanks June

Model built with ReDrafter produces substantially lower quality outputs

@geaned Thanks for providing the feedback. When we develop ReDrafter, it is a collaboration with key customer based on Medusa algorithm, later there are new speculative decoding algorithm invented and...

[Feature] Prompt lookup speculative decoding for LLM API

Hi @tonyay163, Thanks for bringing this to our attention. It is true that prompt lookup speculative decoding is not exposed in the LLM API level now. Recently we are working...

[Feature] Prompt lookup speculative decoding for LLM API

@tonyay163 As @Superjomn said, we are now focusing on the PyTorch path to improve the ease-of-use of TensorRT-LLM(with still ensuring the best performance). Also since there is already Prompt Lookup...

test: Add gpqa tests for DeepSeek models

@LarryXFly for vis since this can introduce new test cases for QA pipeline. June

test: Add gpqa tests for DeepSeek models

@lfr-0531 BTW, I notice that the GPQA test will be run for at least twice, how long it take to finish a single round of GPQA evaluation in this setting?...

perf: Readd iteration logging for trtllm-bench.

@jiahanc for vis on this MR about trtllm-bench also. June

How to build TensorRT-LLM engine on host and deploy to Jetson Orin Nano Super?

@Sesameisgod Hi, TensorRT-LLM has two backends now, one based on TensorRT(the first workflow supported in TensorRT-LLM) and the other based on PyTorch(the new supported workflow since 0.17 release). For TensorRT...