bhsueh_NV comments

Results 639 comments of


                                            bhsueh_NV

How to make the server call tensorrt_llm/examples/run.py?

It is at inflight_batcher_llm/preprocessing/1/model.py

run demo generation failed

Could you try print the related variables in https://github.com/triton-inference-server/tensorrtllm_backend/blob/v0.7.1/all_models/inflight_batcher_llm/preprocessing/1/model.py#L210?

run demo generation failed

Because we cannot reproduce your issue, so we cannot provide the timeline about fixing. Also, might you try the latest main branch because there are many updates after v0.7.1.

run demo generation failed

Could you share the full reproduced steps instead of only sharing the scripts of launching server? Also, please check again that you really use the latest main branch. For example,...

No white space included in tokens sent back by Llama2 in streaming mode

Have you tried the `tensorrt_llm_bls` module?

Example of LoRa weights

Here is example https://github.com/triton-inference-server/tensorrtllm_backend/tree/main/inflight_batcher_llm#running-lora-inference-with-inflight-batching

Can tensorrtllm backend support LogitsProcessor?

Currently, TRT LLM backend does not support such requirement.

ERROR: Failed to create instance: unexpected error when creating modelInstanceState

Could you try latest main branch?

You need to setup some runtime parameters like `triton_max_batch_size`, `max_beam_width`, ... (The parameters like `${xxx}`). Here is document https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/docs/gemma.md#end-to-end-workflow-to-run-sp-model.

Input tensor 'host_sink_token_length' not found when launch llama2-7b.

The tensorrt version of `nvcr.io/nvidia/tritonserver:23.12-trtllm-python-py3` is v0.7.0, so you will encounter such issue when you build engine with the v0.7.1. I sugges using the docker file to build docker image...