What is the recommended way to do benchmark
-
There are two folders, python and cpp, what is the relationship between them?
-
which is recommended to do benchmark?
-
I want to benchmark the token throughput when using lora. Seems only static batching is supported currently. Is there any script to benchmark the performance using lora?
- Scripts in
pythonis benchmarking the Python runtime of TensorRT-LLM, whilecppincludes scripts to benchmark the C++ runtime, which includes support to benchmark static batching and inflight batching implementation. - C++ runtime is more recommended to do the benchmark.
- Inflight batching is supported for LoRA. You can find documents to benchmark LoRA here, and more details regarding LoRA here.
Thanks.
- Scripts in
pythonis benchmarking the Python runtime of TensorRT-LLM, whilecppincludes scripts to benchmark the C++ runtime, which includes support to benchmark static batching and inflight batching implementation.- C++ runtime is more recommended to do the benchmark.
- Inflight batching is supported for LoRA. You can find documents to benchmark LoRA here, and more details regarding LoRA here.
Thanks.
Thank you, I'll try that
- Scripts in
pythonis benchmarking the Python runtime of TensorRT-LLM, whilecppincludes scripts to benchmark the C++ runtime, which includes support to benchmark static batching and inflight batching implementation.- C++ runtime is more recommended to do the benchmark.
- Inflight batching is supported for LoRA. You can find documents to benchmark LoRA here, and more details regarding LoRA here.
Thanks.
Hello, seems there is something wrong with the lora feature.
python examples/llama/convert_checkpoint.py --model_dir ${MODEL_CHECKPOINT} \
--output_dir ${CONVERTED_CHECKPOINT} \
--dtype ${DTYPE} \
--tp_size ${TP} \
--pp_size 1 \
--lora_target_modules attn_qkv \
--max_lora_rank ${MAX_LORA_RANK}
the convert_checkpoint.py file has no arguments for --lora_target_modules and --max_lora_rank
@sleepwalker2017 Sorry, the documents under benchmark directory for LoRA is outdated, we will fix them. Please refer to the documents here and try the commands there, which should be updated.
@sleepwalker2017 Sorry, the documents under benchmark directory for LoRA is outdated, we will fix them. Please refer to the documents here and try the commands there, which should be updated.
I got this new issue when generating test data for lora inflight batching https://github.com/NVIDIA/TensorRT-LLM/issues/1453
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."