TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

What is the recommended way to do benchmark

Open sleepwalker2017 opened this issue 1 year ago • 7 comments

  1. There are two folders, python and cpp, what is the relationship between them?

  2. which is recommended to do benchmark?

  3. I want to benchmark the token throughput when using lora. Seems only static batching is supported currently. Is there any script to benchmark the performance using lora?

sleepwalker2017 avatar Apr 08 '24 11:04 sleepwalker2017

  1. Scripts in python is benchmarking the Python runtime of TensorRT-LLM, while cpp includes scripts to benchmark the C++ runtime, which includes support to benchmark static batching and inflight batching implementation.
  2. C++ runtime is more recommended to do the benchmark.
  3. Inflight batching is supported for LoRA. You can find documents to benchmark LoRA here, and more details regarding LoRA here.

Thanks.

kaiyux avatar Apr 11 '24 14:04 kaiyux

  1. Scripts in python is benchmarking the Python runtime of TensorRT-LLM, while cpp includes scripts to benchmark the C++ runtime, which includes support to benchmark static batching and inflight batching implementation.
  2. C++ runtime is more recommended to do the benchmark.
  3. Inflight batching is supported for LoRA. You can find documents to benchmark LoRA here, and more details regarding LoRA here.

Thanks.

Thank you, I'll try that

sleepwalker2017 avatar Apr 15 '24 02:04 sleepwalker2017

  1. Scripts in python is benchmarking the Python runtime of TensorRT-LLM, while cpp includes scripts to benchmark the C++ runtime, which includes support to benchmark static batching and inflight batching implementation.
  2. C++ runtime is more recommended to do the benchmark.
  3. Inflight batching is supported for LoRA. You can find documents to benchmark LoRA here, and more details regarding LoRA here.

Thanks.

Hello, seems there is something wrong with the lora feature.

python examples/llama/convert_checkpoint.py --model_dir ${MODEL_CHECKPOINT} \
                              --output_dir ${CONVERTED_CHECKPOINT} \
                              --dtype ${DTYPE} \
                              --tp_size ${TP} \
                              --pp_size 1 \
                              --lora_target_modules attn_qkv \
                              --max_lora_rank ${MAX_LORA_RANK}

the convert_checkpoint.py file has no arguments for --lora_target_modules and --max_lora_rank

sleepwalker2017 avatar Apr 15 '24 03:04 sleepwalker2017

@sleepwalker2017 Sorry, the documents under benchmark directory for LoRA is outdated, we will fix them. Please refer to the documents here and try the commands there, which should be updated.

kaiyux avatar Apr 18 '24 02:04 kaiyux

@sleepwalker2017 Sorry, the documents under benchmark directory for LoRA is outdated, we will fix them. Please refer to the documents here and try the commands there, which should be updated.

I got this new issue when generating test data for lora inflight batching https://github.com/NVIDIA/TensorRT-LLM/issues/1453

sleepwalker2017 avatar Apr 18 '24 03:04 sleepwalker2017

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."

github-actions[bot] avatar May 27 '24 01:05 github-actions[bot]