juney-nvidia comments

Results 117 comments of


                                            juney-nvidia

fix: Fix converting EXAONE when using model_weights_loader

@byshiue can you help review this MR? Thanks June

Can I use triton server tensorrtllm backend to host other tensorrt built models? If not what do you suggest if our models stack is mixed of LLM and non-LLM models

@zmy1116 Hi, The error message consisting of `libtriton_tensorrt.so` indicates that you are trying to use the TensorRT backend to serve a specific model. And in TensorRT-LLM backend repo we haven't...

fix: The constructor checks useDynamicTree but doesn’t validate dynamicTreeMaxTopK if set

@byshiue @lfr-0531 can you help review this MR? Thanks June

How to reproduce 150 TPS using FP8 + MTP=0 + BSZ=1 on H200?

@jiahanc Hi Cyrus, I think you are the right person to answer this question? :) cc @NVGaryJi for vis also.

chore: Add second possible output for llava

> LGTM. I don't have any approve button through Can you try again?

chore: Add second possible output for llava

And let me trigger the CI since this MR although is small it can affect the test.

chore: Add second possible output for llava

/bot run

chore: Add second possible output for llava

/bot run

Does trtllm-serve enables prefix caching automatically with Deepseek-R1?

@Bihan Hi, pref caching(KV Cache reusing) is still being developed by our engineering team. I would expect that it can get landed into the main branch in the upcoming weeks....

fix: Early exit cmake if find_library() does not find any lib

Thanks for contributing this fix, @WilliamTambellini . Let me trigger the CI now. June