Kaiyu Xie
Kaiyu Xie
@Pzzzzz5142 Thanks for your contribution, we've integrated your fixed in the internal codebase, which will be updated in the next push to the GitHub main branch. We'll credit you as...
Hi @KuntaiDu , please see the official location of the `config.pbtxt` file for v0.11 at here: https://github.com/triton-inference-server/tensorrtllm_backend/tree/v0.11.0/all_models/inflight_batcher_llm. Before you launch the tritonserver, you'll need to set several parameters, please follow...
@A-transformer Can you please also help us understand a little bit more on why is this change necessary? Thanks!
/bot run --add-multi-gpu-test
Hi @tloen , the issue should be addressed after [this PR](https://github.com/NVIDIA/TensorRT-LLM/pull/2333), can you please try and see if that solves the problem? Feel free to let us know if there...
Hi @RobinJYM , `generation_time` here means latency of generation stage, so if I understand the question correctly, if you want the latency of "rest tokens apart from the first token",...
> @kaiyux Could you advise what would be the approach for external contribution here? Since we do not switch to GitHub development for this repo yet, we'll need someone to...
> Thanks @kaiyux! I can help with integrating to the internal repo once the changes are finalized. What steps need to be taken to properly credit the contributor? We do...
Hi @xwuShirley, thanks for your attention. There are some changes we haven't update to the main branch yet, we will keep you posted.
/bot run --skip-test