TensorRT-LLM Unable to run benchmark of mixtral

Hi,

I am trying to benchmark Mixtral-7B,, however I get this error:

BS: 64, ISL/OSL: 128,128
^A[TensorRT-LLM][ERROR] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[TensorRT-LLM][ERROR] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[TensorRT-LLM][ERROR] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[TensorRT-LLM][ERROR] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[TensorRT-LLM][ERROR] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[TensorRT-LLM][ERROR] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[TensorRT-LLM][ERROR] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[TensorRT-LLM][ERROR] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[TensorRT-LLM][ERROR] [TensorRT-LLM][ERROR] Assertion failed: Tensor 'last_token_ids' has invalid shape (64), expected (-1) (/code/tensorrt_llm/cpp/tensorrt_llm/runtime/tllmRuntime.cpp:150)
1       0x55f4467ca07a tensorrt_llm::common::throwRuntimeError(char const*, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 100
2       0x7f0b2822e96f tensorrt_llm::runtime::TllmRuntime::setInputTensors(int, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::shared_ptr<tensorrt_llm::runtime::ITensor>, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::shared_ptr<tensorrt_llm::runtime::ITensor> > > > const&) + 1823
3       0x7f0b281ea6b2 tensorrt_llm::runtime::GptSession::executeContextStep(std::vector<tensorrt_llm::runtime::GenerationInput, std::allocator<tensorrt_llm::runtime::GenerationInput> > const&, std::vector<tensorrt_llm::runtime::GenerationOutput, std::allocator<tensorrt_llm::runtime::GenerationOutput> >&, std::vector<int, std::allocator<int> > const&, tensorrt_llm::batch_manager::kv_cache_manager::KVCacheManager const*) + 882
4       0x7f0b281ebac2 tensorrt_llm::runtime::GptSession::generateBatched(std::vector<tensorrt_llm::runtime::GenerationOutput, std::allocator<tensorrt_llm::runtime::GenerationOutput> >&, std::vector<tensorrt_llm::runtime::GenerationInput, std::allocator<tensorrt_llm::runtime::GenerationInput> > const&, tensorrt_llm::runtime::SamplingConfig const&, std::function<void (int, bool)> const&) + 3138
5       0x7f0b281ed1e3 tensorrt_llm::runtime::GptSession::generate(tensorrt_llm::runtime::GenerationOutput&, tensorrt_llm::runtime::GenerationInput const&, tensorrt_llm::runtime::SamplingConfig const&) + 2787
6       0x55f4467cfaf3 ../../cpp/build/benchmarks/gptSessionBenchmark(+0x1aaf3) [0x55f4467cfaf3]
7       0x7f0adac3cd90 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f0adac3cd90]
8       0x7f0adac3ce40 __libc_start_main + 128
9       0x55f4467d1a85 ../../cpp/build/benchmarks/gptSessionBenchmark(+0x1ca85) [0x55f4467d1a85]
[TensorRT-LLM][ERROR] [TensorRT-LLM][ERROR] Assertion failed: Tensor 'last_token_ids' has invalid shape (64), expected (-1) (/code/tensorrt_llm/cpp/tensorrt_llm/runtime/tllmRuntime.cpp:150)
1       0x557e587b307a tensorrt_llm::common::throwRuntimeError(char const*, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 100
2       0x7f903853d96f tensorrt_llm::runtime::TllmRuntime::setInputTensors(int, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::shared_ptr<tensorrt_llm::runtime::ITensor>, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::shared_ptr<tensorrt_llm::runtime::ITensor> > > > const&) + 1823
3       0x7f90384f96b2 tensorrt_llm::runtime::GptSession::executeContextStep(std::vector<tensorrt_llm::runtime::GenerationInput, std::allocator<tensorrt_llm::runtime::GenerationInput> > const&, std::vector<tensorrt_llm::runtime::GenerationOutput, std::allocator<tensorrt_llm::runtime::GenerationOutput> >&, std::vector<int, std::allocator<int> > const&, tensorrt_llm::batch_manager::kv_cache_manager::KVCacheManager const*) + 882
4       0x7f90384faac2 tensorrt_llm::runtime::GptSession::generateBatched(std::vector<tensorrt_llm::runtime::GenerationOutput, std::allocator<tensorrt_llm::runtime::GenerationOutput> >&, std::vector<tensorrt_llm::runtime::GenerationInput, std::allocator<tensorrt_llm::runtime::GenerationInput> > const&, tensorrt_llm::runtime::SamplingConfig const&, std::function<void (int, bool)> const&) + 3138
5       0x7f90384fc1e3 tensorrt_llm::runtime::GptSession::generate(tensorrt_llm::runtime::GenerationOutput&, tensorrt_llm::runtime::GenerationInput const&, tensorrt_llm::runtime::SamplingConfig const&) + 2787
6       0x557e587b8af3 ../../cpp/build/benchmarks/gptSessionBenchmark(+0x1aaf3) [0x557e587b8af3]
7       0x7f8feaf4bd90 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f8feaf4bd90]
8       0x7f8feaf4be40 __libc_start_main + 128
9       0x557e587baa85 ../../cpp/build/benchmarks/gptSessionBenchmark(+0x1ca85) [0x557e587baa85]
[TensorRT-LLM][ERROR] [TensorRT-LLM][ERROR] Assertion failed: Tensor 'last_token_ids' has invalid shape (64), expected (-1) (/code/tensorrt_llm/cpp/tensorrt_llm/runtime/tllmRuntime.cpp:150)
1       0x5574a6f4307a tensorrt_llm::common::throwRuntimeError(char const*, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 100
2       0x7fbe63cdc96f tensorrt_llm::runtime::TllmRuntime::setInputTensors(int, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::shared_ptr<tensorrt_llm::runtime::ITensor>, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::shared_ptr<tensorrt_llm::runtime::ITensor> > > > const&) + 1823
3       0x7fbe63c986b2 tensorrt_llm::runtime::GptSession::executeContextStep(std::vector<tensorrt_llm::runtime::GenerationInput, std::allocator<tensorrt_llm::runtime::GenerationInput> > const&, std::vector<tensorrt_llm::runtime::GenerationOutput, std::allocator<tensorrt_llm::runtime::GenerationOutput> >&, std::vector<int, std::allocator<int> > const&, tensorrt_llm::batch_manager::kv_cache_manager::KVCacheManager const*) + 882
4       0x7fbe63c99ac2 tensorrt_llm::runtime::GptSession::generateBatched(std::vector<tensorrt_llm::runtime::GenerationOutput, std::allocator<tensorrt_llm::runtime::GenerationOutput> >&, std::vector<tensorrt_llm::runtime::GenerationInput, std::allocator<tensorrt_llm::runtime::GenerationInput> > const&, tensorrt_llm::runtime::SamplingConfig const&, std::function<void (int, bool)> const&) + 3138
5       0x7fbe63c9b1e3 tensorrt_llm::runtime::GptSession::generate(tensorrt_llm::runtime::GenerationOutput&, tensorrt_llm::runtime::GenerationInput const&, tensorrt_llm::runtime::SamplingConfig const&) + 2787
6       0x5574a6f48af3 ../../cpp/build/benchmarks/gptSessionBenchmark(+0x1aaf3) [0x5574a6f48af3]
7       0x7fbe166ead90 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7fbe166ead90]
8       0x7fbe166eae40 __libc_start_main + 128
9       0x5574a6f4aa85 ../../cpp/build/benchmarks/gptSessionBenchmark(+0x1ca85) [0x5574a6f4aa85]
[TensorRT-LLM][ERROR] [TensorRT-LLM][ERROR] Assertion failed: Tensor 'last_token_ids' has invalid shape (64), expected (-1) (/code/tensorrt_llm/cpp/tensorrt_llm/runtime/tllmRuntime.cpp:150)
1       0x560dea94f07a tensorrt_llm::common::throwRuntimeError(char const*, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 100
2       0x7fad1582d96f tensorrt_llm::runtime::TllmRuntime::setInputTensors(int, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::shared_ptr<tensorrt_llm::runtime::ITensor>, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::shared_ptr<tensorrt_llm::runtime::ITensor> > > > const&) + 1823
3       0x7fad157e96b2 tensorrt_llm::runtime::GptSession::executeContextStep(std::vector<tensorrt_llm::runtime::GenerationInput, std::allocator<tensorrt_llm::runtime::GenerationInput> > const&, std::vector<tensorrt_llm::runtime::GenerationOutput, std::allocator<tensorrt_llm::runtime::GenerationOutput> >&, std::vector<int, std::allocator<int> > const&, tensorrt_llm::batch_manager::kv_cache_manager::KVCacheManager const*) + 882
4       0x7fad157eaac2 tensorrt_llm::runtime::GptSession::generateBatched(std::vector<tensorrt_llm::runtime::GenerationOutput, std::allocator<tensorrt_llm::runtime::GenerationOutput> >&, std::vector<tensorrt_llm::runtime::GenerationInput, std::allocator<tensorrt_llm::runtime::GenerationInput> > const&, tensorrt_llm::runtime::SamplingConfig const&, std::function<void (int, bool)> const&) + 3138
5       0x7fad157ec1e3 tensorrt_llm::runtime::GptSession::generate(tensorrt_llm::runtime::GenerationOutput&, tensorrt_llm::runtime::GenerationInput const&, tensorrt_llm::runtime::SamplingConfig const&) + 2787
6       0x560dea954af3 ../../cpp/build/benchmarks/gptSessionBenchmark(+0x1aaf3) [0x560dea954af3]
7       0x7facc823bd90 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7facc823bd90]
8       0x7facc823be40 __libc_start_main + 128
9       0x560dea956a85 ../../cpp/build/benchmarks/gptSessionBenchmark(+0x1ca85) [0x560dea956a85]
[TensorRT-LLM][ERROR] [TensorRT-LLM][ERROR] Assertion failed: Tensor 'last_token_ids' has invalid shape (64), expected (-1) (/code/tensorrt_llm/cpp/tensorrt_llm/runtime/tllmRuntime.cpp:150)
1       0x55c76d4dd07a tensorrt_llm::common::throwRuntimeError(char const*, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 100
2       0x7f47a4f9396f tensorrt_llm::runtime::TllmRuntime::setInputTensors(int, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::shared_ptr<tensorrt_llm::runtime::ITensor>, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::shared_ptr<tensorrt_llm::runtime::ITensor> > > > const&) + 1823
3       0x7f47a4f4f6b2 tensorrt_llm::runtime::GptSession::executeContextStep(std::vector<tensorrt_llm::runtime::GenerationInput, std::allocator<tensorrt_llm::runtime::GenerationInput> > const&, std::vector<tensorrt_llm::runtime::GenerationOutput, std::allocator<tensorrt_llm::runtime::GenerationOutput> >&, std::vector<int, std::allocator<int> > const&, tensorrt_llm::batch_manager::kv_cache_manager::KVCacheManager const*) + 882
4       0x7f47a4f50ac2 tensorrt_llm::runtime::GptSession::generateBatched(std::vector<tensorrt_llm::runtime::GenerationOutput, std::allocator<tensorrt_llm::runtime::GenerationOutput> >&, std::vector<tensorrt_llm::runtime::GenerationInput, std::allocator<tensorrt_llm::runtime::GenerationInput> > const&, tensorrt_llm::runtime::SamplingConfig const&, std::function<void (int, bool)> const&) + 3138
5       0x7f47a4f521e3 tensorrt_llm::runtime::GptSession::generate(tensorrt_llm::runtime::GenerationOutput&, tensorrt_llm::runtime::GenerationInput const&, tensorrt_llm::runtime::SamplingConfig const&) + 2787
6       0x55c76d4e2af3 ../../cpp/build/benchmarks/gptSessionBenchmark(+0x1aaf3) [0x55c76d4e2af3]
7       0x7f47579a1d90 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f47579a1d90]
8       0x7f47579a1e40 __libc_start_main + 128
9       0x55c76d4e4a85 ../../cpp/build/benchmarks/gptSessionBenchmark(+0x1ca85) [0x55c76d4e4a85]
[TensorRT-LLM][ERROR] [TensorRT-LLM][ERROR] Assertion failed: Tensor 'last_token_ids' has invalid shape (64), expected (-1) (/code/tensorrt_llm/cpp/tensorrt_llm/runtime/tllmRuntime.cpp:150)
1       0x55fb0830a07a tensorrt_llm::common::throwRuntimeError(char const*, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 100
2       0x7fcf23f8896f tensorrt_llm::runtime::TllmRuntime::setInputTensors(int, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::shared_ptr<tensorrt_llm::runtime::ITensor>, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::shared_ptr<tensorrt_llm::runtime::ITensor> > > > const&) + 1823
3       0x7fcf23f446b2 tensorrt_llm::runtime::GptSession::executeContextStep(std::vector<tensorrt_llm::runtime::GenerationInput, std::allocator<tensorrt_llm::runtime::GenerationInput> > const&, std::vector<tensorrt_llm::runtime::GenerationOutput, std::allocator<tensorrt_llm::runtime::GenerationOutput> >&, std::vector<int, std::allocator<int> > const&, tensorrt_llm::batch_manager::kv_cache_manager::KVCacheManager const*) + 882
4       0x7fcf23f45ac2 tensorrt_llm::runtime::GptSession::generateBatched(std::vector<tensorrt_llm::runtime::GenerationOutput, std::allocator<tensorrt_llm::runtime::GenerationOutput> >&, std::vector<tensorrt_llm::runtime::GenerationInput, std::allocator<tensorrt_llm::runtime::GenerationInput> > const&, tensorrt_llm::runtime::SamplingConfig const&, std::function<void (int, bool)> const&) + 3138
5       0x7fcf23f471e3 tensorrt_llm::runtime::GptSession::generate(tensorrt_llm::runtime::GenerationOutput&, tensorrt_llm::runtime::GenerationInput const&, tensorrt_llm::runtime::SamplingConfig const&) + 2787
6       0x55fb0830faf3 ../../cpp/build/benchmarks/gptSessionBenchmark(+0x1aaf3) [0x55fb0830faf3]
7       0x7fced6996d90 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7fced6996d90]
8       0x7fced6996e40 __libc_start_main + 128
9       0x55fb08311a85 ../../cpp/build/benchmarks/gptSessionBenchmark(+0x1ca85) [0x55fb08311a85]
[TensorRT-LLM][ERROR] [TensorRT-LLM][ERROR] Assertion failed: Tensor 'last_token_ids' has invalid shape (64), expected (-1) (/code/tensorrt_llm/cpp/tensorrt_llm/runtime/tllmRuntime.cpp:150)
1       0x55d94d1cf07a tensorrt_llm::common::throwRuntimeError(char const*, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 100
2       0x7f20c674896f tensorrt_llm::runtime::TllmRuntime::setInputTensors(int, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::shared_ptr<tensorrt_llm::runtime::ITensor>, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::shared_ptr<tensorrt_llm::runtime::ITensor> > > > const&) + 1823
3       0x7f20c67046b2 tensorrt_llm::runtime::GptSession::executeContextStep(std::vector<tensorrt_llm::runtime::GenerationInput, std::allocator<tensorrt_llm::runtime::GenerationInput> > const&, std::vector<tensorrt_llm::runtime::GenerationOutput, std::allocator<tensorrt_llm::runtime::GenerationOutput> >&, std::vector<int, std::allocator<int> > const&, tensorrt_llm::batch_manager::kv_cache_manager::KVCacheManager const*) + 882
4       0x7f20c6705ac2 tensorrt_llm::runtime::GptSession::generateBatched(std::vector<tensorrt_llm::runtime::GenerationOutput, std::allocator<tensorrt_llm::runtime::GenerationOutput> >&, std::vector<tensorrt_llm::runtime::GenerationInput, std::allocator<tensorrt_llm::runtime::GenerationInput> > const&, tensorrt_llm::runtime::SamplingConfig const&, std::function<void (int, bool)> const&) + 3138
5       0x7f20c67071e3 tensorrt_llm::runtime::GptSession::generate(tensorrt_llm::runtime::GenerationOutput&, tensorrt_llm::runtime::GenerationInput const&, tensorrt_llm::runtime::SamplingConfig const&) + 2787
6       0x55d94d1d4af3 ../../cpp/build/benchmarks/gptSessionBenchmark(+0x1aaf3) [0x55d94d1d4af3]
7       0x7f2079156d90 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f2079156d90]
8       0x7f2079156e40 __libc_start_main + 128
9       0x55d94d1d6a85 ../../cpp/build/benchmarks/gptSessionBenchmark(+0x1ca85) [0x55d94d1d6a85]
[TensorRT-LLM][ERROR] [TensorRT-LLM][ERROR] Assertion failed: Tensor 'last_token_ids' has invalid shape (64), expected (-1) (/code/tensorrt_llm/cpp/tensorrt_llm/runtime/tllmRuntime.cpp:150)
1       0x55b4b7dce07a tensorrt_llm::common::throwRuntimeError(char const*, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 100
2       0x7f81c7a8196f tensorrt_llm::runtime::TllmRuntime::setInputTensors(int, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::shared_ptr<tensorrt_llm::runtime::ITensor>, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::shared_ptr<tensorrt_llm::runtime::ITensor> > > > const&) + 1823
3       0x7f81c7a3d6b2 tensorrt_llm::runtime::GptSession::executeContextStep(std::vector<tensorrt_llm::runtime::GenerationInput, std::allocator<tensorrt_llm::runtime::GenerationInput> > const&, std::vector<tensorrt_llm::runtime::GenerationOutput, std::allocator<tensorrt_llm::runtime::GenerationOutput> >&, std::vector<int, std::allocator<int> > const&, tensorrt_llm::batch_manager::kv_cache_manager::KVCacheManager const*) + 882
4       0x7f81c7a3eac2 tensorrt_llm::runtime::GptSession::generateBatched(std::vector<tensorrt_llm::runtime::GenerationOutput, std::allocator<tensorrt_llm::runtime::GenerationOutput> >&, std::vector<tensorrt_llm::runtime::GenerationInput, std::allocator<tensorrt_llm::runtime::GenerationInput> > const&, tensorrt_llm::runtime::SamplingConfig const&, std::function<void (int, bool)> const&) + 3138
5       0x7f81c7a401e3 tensorrt_llm::runtime::GptSession::generate(tensorrt_llm::runtime::GenerationOutput&, tensorrt_llm::runtime::GenerationInput const&, tensorrt_llm::runtime::SamplingConfig const&) + 2787
6       0x55b4b7dd3af3 ../../cpp/build/benchmarks/gptSessionBenchmark(+0x1aaf3) [0x55b4b7dd3af3]
7       0x7f817a48fd90 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f817a48fd90]
8       0x7f817a48fe40 __libc_start_main + 128
9       0x55b4b7dd5a85 ../../cpp/build/benchmarks/gptSessionBenchmark(+0x1ca85) [0x55b4b7dd5a85]

This is my build command:

python ../llama/build.py --model_dir /mnt/mnt/Mistral-7B-v0.1 --use_inflight_batching --enable_context_fmha --parallel_build --output_dir ./tmp/mistral/trt_engines/fp16/8-gpu/ --dtype float16 --world_size 8 --tp_size 8 --pp_size 1

This is my benchmark script:

in_out_sizes=("64:128,128" "64:128,2048" "64:2048,128" "64:2048,2048")
for in_out in ${in_out_sizes[@]}
do
        batch_size=$(echo $in_out | awk -F':' '{ print $1 }')
        in_out_dims=$(echo $in_out | awk -F':' '{ print $2 }')
        echo "BS: $batch_size, ISL/OSL: $in_out_dims"
        mpirun -n 8 --allow-run-as-root --oversubscribe ../../cpp/build/benchmarks/gptSessionBenchmark --model llama --engine_dir  ./tmp/mistral/trt_engines/fp16/8-gpu/ --warm_up 1 --batch_size $batch_size --duration 0 --num_runs 5 --input_output_len $in_out_dims


done

Thank you.

Feb 08 '24 22:02 MustafaFayez

I have the same issue @MustafaFayez did you get any fix for this?

Feb 13 '24 22:02 premmotgi

nope, hoping we get some reply from the contributors

Feb 15 '24 18:02 MustafaFayez

You need to raise your max token size. 2309 >= your max token size here.

Feb 25 '24 20:02 noahnisbet

@MustafaFayez Not sure which version did you use, but if you're using v0.7.1, --max_batch_size is 8 by default. (see here) I saw that you did not specify --max_batch_size, while trying to benchmark on batch size 64, which is not expected.

Can you please try setting --max_batch_size, --max_input_len and --max_output_len, and see if the issue persists?

Feb 26 '24 11:02 kaiyux