juney-nvidia
juney-nvidia
Thanks for submitting this PR @brb-nv, I noticed that multiple E2E tests have been added. Do you have any rough estimation about the increased pre-merge time? I am asking this...
> > Thanks for submitting this PR @brb-nv, I noticed that multiple E2E tests have been added. Do you have any rough estimation about the increased pre-merge time? I am...
@QiJune @byshiue can you help review this MR from @brb-nv since I know you are now working to reduce the CI test time. Thanks June
@geaned Thanks for providing the feedback. When we develop ReDrafter, it is a collaboration with key customer based on Medusa algorithm, later there are new speculative decoding algorithm invented and...
Hi @tonyay163, Thanks for bringing this to our attention. It is true that prompt lookup speculative decoding is not exposed in the LLM API level now. Recently we are working...
@tonyay163 As @Superjomn said, we are now focusing on the PyTorch path to improve the ease-of-use of TensorRT-LLM(with still ensuring the best performance). Also since there is already Prompt Lookup...
@LarryXFly for vis since this can introduce new test cases for QA pipeline. June
@lfr-0531 BTW, I notice that the GPQA test will be run for at least twice, how long it take to finish a single round of GPQA evaluation in this setting?...
@jiahanc for vis on this MR about trtllm-bench also. June
@Sesameisgod Hi, TensorRT-LLM has two backends now, one based on TensorRT(the first workflow supported in TensorRT-LLM) and the other based on PyTorch(the new supported workflow since 0.17 release). For TensorRT...