TensorRT-LLM What is the recommended way to do benchmark

There are two folders, python and cpp, what is the relationship between them?
which is recommended to do benchmark?
I want to benchmark the token throughput when using lora. Seems only static batching is supported currently. Is there any script to benchmark the performance using lora?

Apr 08 '24 11:04 sleepwalker2017

Scripts in python is benchmarking the Python runtime of TensorRT-LLM, while cpp includes scripts to benchmark the C++ runtime, which includes support to benchmark static batching and inflight batching implementation.
C++ runtime is more recommended to do the benchmark.
Inflight batching is supported for LoRA. You can find documents to benchmark LoRA here, and more details regarding LoRA here.

Thanks.

Apr 11 '24 14:04 kaiyux

Scripts in python is benchmarking the Python runtime of TensorRT-LLM, while cpp includes scripts to benchmark the C++ runtime, which includes support to benchmark static batching and inflight batching implementation.

C++ runtime is more recommended to do the benchmark.

Inflight batching is supported for LoRA. You can find documents to benchmark LoRA here, and more details regarding LoRA here.

Thanks.

Thank you, I'll try that

Apr 15 '24 02:04 sleepwalker2017

Scripts in python is benchmarking the Python runtime of TensorRT-LLM, while cpp includes scripts to benchmark the C++ runtime, which includes support to benchmark static batching and inflight batching implementation.

C++ runtime is more recommended to do the benchmark.

Inflight batching is supported for LoRA. You can find documents to benchmark LoRA here, and more details regarding LoRA here.

Thanks.

Hello, seems there is something wrong with the lora feature.

python examples/llama/convert_checkpoint.py --model_dir ${MODEL_CHECKPOINT} \
                              --output_dir ${CONVERTED_CHECKPOINT} \
                              --dtype ${DTYPE} \
                              --tp_size ${TP} \
                              --pp_size 1 \
                              --lora_target_modules attn_qkv \
                              --max_lora_rank ${MAX_LORA_RANK}

the convert_checkpoint.py file has no arguments for --lora_target_modules and --max_lora_rank

Apr 15 '24 03:04 sleepwalker2017

@sleepwalker2017 Sorry, the documents under benchmark directory for LoRA is outdated, we will fix them. Please refer to the documents here and try the commands there, which should be updated.

Apr 18 '24 02:04 kaiyux

@sleepwalker2017 Sorry, the documents under benchmark directory for LoRA is outdated, we will fix them. Please refer to the documents here and try the commands there, which should be updated.

I got this new issue when generating test data for lora inflight batching https://github.com/NVIDIA/TensorRT-LLM/issues/1453

Apr 18 '24 03:04 sleepwalker2017

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."

May 27 '24 01:05 github-actions[bot]