juney-nvidia
juney-nvidia
@byshiue can you help review this MR? Thanks June
@zmy1116 Hi, The error message consisting of `libtriton_tensorrt.so` indicates that you are trying to use the TensorRT backend to serve a specific model. And in TensorRT-LLM backend repo we haven't...
@byshiue @lfr-0531 can you help review this MR? Thanks June
@jiahanc Hi Cyrus, I think you are the right person to answer this question? :) cc @NVGaryJi for vis also.
> LGTM. I don't have any approve button through Can you try again?
And let me trigger the CI since this MR although is small it can affect the test.
@Bihan Hi, pref caching(KV Cache reusing) is still being developed by our engineering team. I would expect that it can get landed into the main branch in the upcoming weeks....
Thanks for contributing this fix, @WilliamTambellini . Let me trigger the CI now. June