juney-nvidia

Results 117 comments of juney-nvidia

Thanks for submitting this PR @brb-nv, I noticed that multiple E2E tests have been added. Do you have any rough estimation about the increased pre-merge time? I am asking this...

> > Thanks for submitting this PR @brb-nv, I noticed that multiple E2E tests have been added. Do you have any rough estimation about the increased pre-merge time? I am...

@QiJune @byshiue can you help review this MR from @brb-nv since I know you are now working to reduce the CI test time. Thanks June

@geaned Thanks for providing the feedback. When we develop ReDrafter, it is a collaboration with key customer based on Medusa algorithm, later there are new speculative decoding algorithm invented and...

Hi @tonyay163, Thanks for bringing this to our attention. It is true that prompt lookup speculative decoding is not exposed in the LLM API level now. Recently we are working...

@tonyay163 As @Superjomn said, we are now focusing on the PyTorch path to improve the ease-of-use of TensorRT-LLM(with still ensuring the best performance). Also since there is already Prompt Lookup...

@LarryXFly for vis since this can introduce new test cases for QA pipeline. June

@lfr-0531 BTW, I notice that the GPQA test will be run for at least twice, how long it take to finish a single round of GPQA evaluation in this setting?...

@jiahanc for vis on this MR about trtllm-bench also. June

@Sesameisgod Hi, TensorRT-LLM has two backends now, one based on TensorRT(the first workflow supported in TensorRT-LLM) and the other based on PyTorch(the new supported workflow since 0.17 release). For TensorRT...