TensorRT-LLM API Usage Error

System Info

NVIDIA A100

Who can help?

@seanprime7 @Superjomn @aaronp24 @lukeyeager @aflat

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

Following the example in example/llama

Expected behavior

Can infer normally

actual behavior

[01/31/2024-08:24:58] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[01/31/2024-08:24:58] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[01/31/2024-08:24:58] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[01/31/2024-08:24:58] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[01/31/2024-08:24:58] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[01/31/2024-08:24:58] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[01/31/2024-08:24:58] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[01/31/2024-08:24:58] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[01/31/2024-08:24:58] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[01/31/2024-08:24:58] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[01/31/2024-08:24:58] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[01/31/2024-08:24:58] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[01/31/2024-08:24:58] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[01/31/2024-08:24:58] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
Traceback (most recent call last):
  File "/workspace/example/run.py", line 532, in <module>
    main(args)
  File "/workspace/example/run.py", line 410, in main
    outputs = runner.generate(
  File "/workspace/TensorRT-LLM/tensorrt_llm/runtime/model_runner.py", line 598, in generate
    outputs = self.session.decode(
  File "/workspace/TensorRT-LLM/tensorrt_llm/runtime/generation.py", line 755, in wrapper
    ret = func(self, *args, **kwargs)
  File "/workspace/TensorRT-LLM/tensorrt_llm/runtime/generation.py", line 2891, in decode
    return self.decode_regular(
  File "/workspace/TensorRT-LLM/tensorrt_llm/runtime/generation.py", line 2548, in decode_regular
    should_stop, next_step_tensors, tasks, context_lengths, host_context_lengths, attention_mask, logits, encoder_input_lengths = self.handle_per_step(
  File "/workspace/TensorRT-LLM/tensorrt_llm/runtime/generation.py", line 2231, in handle_per_step
    raise RuntimeError(f"Executing TRT engine failed step={step}!")
RuntimeError: Executing TRT engine failed step=0!
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[18601,1],0]
  Exit code:    1
--------------------------------------------------------------------------

additional notes

I use TensorRT-LLM main branch

Jan 31 '24 08:01 bobbych94

Could you please share the build and run command?

Feb 02 '24 09:02 QiJune

I am also getting this error, more details in this issue - https://github.com/NVIDIA/trt-llm-rag-windows/issues/38

Feb 20 '24 23:02 h7io