TensorRT-LLM How to test the benchmark of Llama3 and Vicuna2 of TensorRT-LLM by benchmark.py

I need to test the benchmark of different models, but it does not in the allowed_configs.py. How to do it? Thanks

May 14 '24 02:05 Ourspolaire1

Hi @Ourspolaire1 , the most suggested way currently is use trtllm-build command to build the models you want to benchmark, and use gptManagerBenchmark to benchmark it, please see the documents:

LLaMA 3 for example: https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/llama#llama-v3-updates
gptManagerBenchmark: https://github.com/NVIDIA/TensorRT-LLM/tree/main/benchmarks/cpp

May 15 '24 09:05 kaiyux

Hi @Ourspolaire1 , the most suggested way currently is use trtllm-build command to build the models you want to benchmark, and use gptManagerBenchmark to benchmark it, please see the documents:

LLaMA 3 for example: https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/llama#llama-v3-updates

gptManagerBenchmark: https://github.com/NVIDIA/TensorRT-LLM/tree/main/benchmarks/cpp Thanks @kaiyux for reply. I met a new error when running python benchmark.

[TensorRT-LLM] TensorRT-LLM version: 0.10.0.dev2024042300 [iZf8ziv3jfzkf1sys9b2ikZ:418418] *** Process received signal *** [iZf8ziv3jfzkf1sys9b2ikZ:418418] Signal: Segmentation fault (11) [iZf8ziv3jfzkf1sys9b2ikZ:418418] Signal code: Address not mapped (1) [iZf8ziv3jfzkf1sys9b2ikZ:418418] Failing at address: 0x18 [iZf8ziv3jfzkf1sys9b2ikZ:418418] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x3c050)[0x7f8c267bc050] [iZf8ziv3jfzkf1sys9b2ikZ:418418] [ 1] /home/shared/trtllm/lib/python3.10/site-packages/tensorrt_llm/libs/libth_common.so(_ZN12tensorrt_llm4thop14TorchAllocator6mallocEmb+0x88)[0x7f8b71883af8] [iZf8ziv3jfzkf1sys9b2ikZ:418418] [ 2] /home/shared/trtllm/lib/python3.10/site-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm6common10IAllocator8reMallocIiEEPT_S4_mb+0xb4)[0x7f8a1ca6cab4] [iZf8ziv3jfzkf1sys9b2ikZ:418418] [ 3] /home/shared/trtllm/lib/python3.10/site-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm6layers18DynamicDecodeLayerIfE14allocateBufferEv+0x38)[0x7f8a1ca6d868] [iZf8ziv3jfzkf1sys9b2ikZ:418418] [ 4] /home/shared/trtllm/lib/python3.10/site-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm6layers18DynamicDecodeLayerIfE10initializeEv+0x1c6)[0x7f8a1ca724d6] [iZf8ziv3jfzkf1sys9b2ikZ:418418] [ 5] /home/shared/trtllm/lib/python3.10/site-packages/tensorrt_llm/libs/libtensorrt_llm.so(ZN12tensorrt_llm6layers18DynamicDecodeLayerIfEC1ERKNS_7runtime12DecodingModeEiiiiP11CUstream_stSt10shared_ptrINS_6common10IAllocatorEEP14cudaDevicePropSt8optionalIiESG+0x225)[0x7f8a1ca72975] [iZf8ziv3jfzkf1sys9b2ikZ:418418] [ 6] /home/shared/trtllm/lib/python3.10/site-packages/tensorrt_llm/libs/libth_common.so(_ZN9torch_ext15FtDynamicDecodeIfEC2Emmmmii+0x2f8)[0x7f8b71861d18] [iZf8ziv3jfzkf1sys9b2ikZ:418418] [ 7] /home/shared/trtllm/lib/python3.10/site-packages/tensorrt_llm/libs/libth_common.so(_ZN9torch_ext15DynamicDecodeOp14createInstanceEv+0x10f)[0x7f8b71846f6f] [iZf8ziv3jfzkf1sys9b2ikZ:418418] [ 8] /home/shared/trtllm/lib/python3.10/site-packages/tensorrt_llm/libs/libth_common.so(_ZN9torch_ext15DynamicDecodeOpC1EllllllN3c1010ScalarTypeE+0x84)[0x7f8b71847034] [iZf8ziv3jfzkf1sys9b2ikZ:418418] [ 9] /home/shared/trtllm/lib/python3.10/site-packages/tensorrt_llm/libs/libth_common.so(ZNSt17_Function_handlerIFvRSt6vectorIN3c106IValueESaIS2_EEEZN5torch6class_IN9torch_ext15DynamicDecodeOpEE12defineMethodIZNSB_3defIJllllllNS1_10ScalarTypeEEEERSB_NS7_6detail5typesIvJDpT_EEESsSt16initializer_listINS7_3argEEEUlNS1_14tagged_capsuleISA_EEllllllSE_E_EEPNS7_3jit8FunctionESsT_SsSN_EUlS5_E_E9_M_invokeERKSt9_Any_dataS5+0xf8)[0x7f8b71862588] [iZf8ziv3jfzkf1sys9b2ikZ:418418] [10] /home/shared/trtllm/lib/python3.10/site-packages/torch/lib/libtorch_python.so(+0xa0f34e)[0x7f8c2440f34e] [iZf8ziv3jfzkf1sys9b2ikZ:418418] [11] /home/shared/trtllm/lib/python3.10/site-packages/torch/lib/libtorch_python.so(+0xa0c8df)[0x7f8c2440c8df] [iZf8ziv3jfzkf1sys9b2ikZ:418418] [12] /home/shared/trtllm/lib/python3.10/site-packages/torch/lib/libtorch_python.so(+0xa0e929)[0x7f8c2440e929] [iZf8ziv3jfzkf1sys9b2ikZ:418418] [13] /home/shared/trtllm/lib/python3.10/site-packages/torch/lib/libtorch_python.so(+0x47de04)[0x7f8c23e7de04] [iZf8ziv3jfzkf1sys9b2ikZ:418418] [14] python3(+0x1b2c86)[0x5639836adc86] [iZf8ziv3jfzkf1sys9b2ikZ:418418] [15] python3(_PyObject_MakeTpCall+0x70)[0x56398361bb50] [iZf8ziv3jfzkf1sys9b2ikZ:418418] [16] python3(+0xe1f19)[0x5639835dcf19] [iZf8ziv3jfzkf1sys9b2ikZ:418418] [17] python3(+0x7511a)[0x56398357011a] [iZf8ziv3jfzkf1sys9b2ikZ:418418] [18] python3(_PyObject_MakeTpCall+0x70)[0x56398361bb50] [iZf8ziv3jfzkf1sys9b2ikZ:418418] [19] python3(_PyEval_EvalFrameDefault+0x53bf)[0x563983676daf] [iZf8ziv3jfzkf1sys9b2ikZ:418418] [20] python3(+0x175c30)[0x563983670c30] [iZf8ziv3jfzkf1sys9b2ikZ:418418] [21] python3(_PyObject_Call_Prepend+0x1ac)[0x56398361c8fc] [iZf8ziv3jfzkf1sys9b2ikZ:418418] [22] python3(+0x153fd1)[0x56398364efd1] [iZf8ziv3jfzkf1sys9b2ikZ:418418] [23] python3(+0x15140b)[0x56398364c40b] [iZf8ziv3jfzkf1sys9b2ikZ:418418] [24] python3(_PyObject_MakeTpCall+0x1f7)[0x56398361bcd7] [iZf8ziv3jfzkf1sys9b2ikZ:418418] [25] python3(_PyEval_EvalFrameDefault+0x562e)[0x56398367701e] [iZf8ziv3jfzkf1sys9b2ikZ:418418] [26] python3(+0x175c30)[0x563983670c30] [iZf8ziv3jfzkf1sys9b2ikZ:418418] [27] python3(_PyObject_Call_Prepend+0xd9)[0x56398361c829] [iZf8ziv3jfzkf1sys9b2ikZ:418418] [28] python3(+0x153fd1)[0x56398364efd1] [iZf8ziv3jfzkf1sys9b2ikZ:418418] [29] python3(+0x15140b)[0x56398364c40b] [iZf8ziv3jfzkf1sys9b2ikZ:418418] *** End of error message *** Segmentation fault

I do not know how to solve it. Are there any ways? Thanks you

May 17 '24 09:05 Ourspolaire1

@Ourspolaire1

The way I did is almost what similar to @kaiyux mentioned above.

Follow the README.md to Download llama3 8B checkpoint
Convert checkpoint & Build TensorRT LLM Engine

Update these scripts based on your required batch_size and all the other things for example

python3 convert_checkpoint.py \
    --meta_ckpt_dir /wkdir/Meta-Llama-3-8B \
    --output_dir ./tllm_checkpoint_2gpu_tp2 \
    --dtype float16 \
    --tp_size 2

trtllm-build \
    --checkpoint_dir ./tllm_checkpoint_2gpu_tp2 \
    --output_dir ./tmp/llama/8B/trt_engines/fp16/2-gpu/ \
    --gemm_plugin float16 \
    --max_batch_size 384 \
    --max_input_len 512 \
    --max_output_len 512 \
    --tp_size 2 \
    --profiling_verbosity detailed

Finally use this Python Benchmarking Script to do the benchmarking. Tweek the commands little bit.

For example:

python3 benchmark.py \
    --engine_dir "/wkdir/TensorRT-LLM/examples/llama/tmp/llama/8B/trt_engines/fp16/1-gpu" \
    --mode plugin \
    --max_batch_size 384 \
    --max_input_len 128 \
    --max_output_len 128 \
    --batch_size 384 \
    --input_output_len "128,128" 



mpirun -n 2 \
python3 benchmark.py \
    --engine_dir "/wkdir/TensorRT-LLM/examples/llama/tmp/llama/8B/trt_engines/fp16/2-gpu" \
    --mode plugin \
    --max_batch_size 384 \
    --max_input_len 128 \
    --max_output_len 128 \
    --batch_size 384 \
    --input_output_len "128,128"

May 20 '24 00:05 raoofnaushad

@raoofnaushad I met a new error. When I test the benchmark.py with batch_size=1 it works, but when I change the batch_size=2and other numbers the error occurs. Do you know how to solve it? Thank you very much! " python3 benchmark.py --engine_dir "/home/shared/TensorRT-LLM-main/examples/llama/tmp/llama/8B/trt_engines/fp16/1-gpu"
--mode plugin
--batch_size "2"
--input_output_len "2,2" "

[TensorRT-LLM] TensorRT-LLM version: 0.11.0.dev2024051400 Allocated 124.01 MiB for execution context memory. /home/shared/trtl/lib/python3.10/site-packages/torch/nested/init.py:166: UserWarning: The PyTorch API of nested tensors is in prototype stage and will change in the near future. (Triggered internally at ../aten/src/ATen/NestedTensorImpl.cpp:177.) return _nested.nested_tensor( [05/22/2024-11:36:26] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [05/22/2024-11:36:26] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [05/22/2024-11:36:26] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [05/22/2024-11:36:26] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [05/22/2024-11:36:26] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [05/22/2024-11:36:26] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [05/22/2024-11:36:26] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [05/22/2024-11:36:26] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [05/22/2024-11:36:26] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [05/22/2024-11:36:26] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2842] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2842, condition: allInputDimensionsSpecified(routine) ) Traceback (most recent call last): File "/home/shared/TensorRT-LLM-main/benchmarks/python/benchmark.py", line 419, in main benchmarker.run(inputs, config) File "/home/shared/TensorRT-LLM-main/benchmarks/python/gpt_benchmark.py", line 240, in run self.decoder.decode_batch(inputs[0], File "/home/shared/trtl/lib/python3.10/site-packages/tensorrt_llm/runtime/generation.py", line 3210, in decode_batch return self.decode(input_ids, File "/home/shared/trtl/lib/python3.10/site-packages/tensorrt_llm/runtime/generation.py", line 930, in wrapper ret = func(self, *args, **kwargs) File "/home/shared/trtl/lib/python3.10/site-packages/tensorrt_llm/runtime/generation.py", line 3431, in decode return self.decode_regular( File "/home/shared/trtl/lib/python3.10/site-packages/tensorrt_llm/runtime/generation.py", line 3045, in decode_regular should_stop, next_step_tensors, tasks, context_lengths, host_context_lengths, attention_mask, context_logits, generation_logits, encoder_input_lengths = self.handle_per_step( File "/home/shared/trtl/lib/python3.10/site-packages/tensorrt_llm/runtime/generation.py", line 2704, in handle_per_step raise RuntimeError(f"Executing TRT engine failed step={step}!") RuntimeError: Executing TRT engine failed step=0!

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/shared/TensorRT-LLM-main/benchmarks/python/benchmark.py", line 518, in main(args) File "/home/shared/TensorRT-LLM-main/benchmarks/python/benchmark.py", line 444, in main e.with_traceback()) TypeError: BaseException.with_traceback() takes exactly one argument (0 given) [TensorRT-LLM] TensorRT-LLM version: 0.11.0.dev2024051400 Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main exitcode = _main(fd, parent_sentinel) File "/usr/local/lib/python3.10/multiprocessing/spawn.py", line 126, in _main self = reduction.pickle.load(from_parent) File "/usr/local/lib/python3.10/multiprocessing/synchronize.py", line 110, in setstate self._semlock = _multiprocessing.SemLock._rebuild(*state) FileNotFoundError: [Errno 2] No such file or directory

May 22 '24 03:05 Ourspolaire1

Hi @Ourspolaire1 do u still have further issue or question now? If not, we'll close it soon.

Nov 14 '24 03:11 nv-guomingz