TensorRT-LLM
TensorRT-LLM copied to clipboard
API Usage Error
System Info
NVIDIA A100
Who can help?
@seanprime7 @Superjomn @aaronp24 @lukeyeager @aflat
Information
- [X] The official example scripts
- [ ] My own modified scripts
Tasks
- [X] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
Following the example in example/llama
Expected behavior
Can infer normally
actual behavior
[01/31/2024-08:24:58] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[01/31/2024-08:24:58] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[01/31/2024-08:24:58] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[01/31/2024-08:24:58] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[01/31/2024-08:24:58] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[01/31/2024-08:24:58] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[01/31/2024-08:24:58] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[01/31/2024-08:24:58] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[01/31/2024-08:24:58] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[01/31/2024-08:24:58] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[01/31/2024-08:24:58] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[01/31/2024-08:24:58] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[01/31/2024-08:24:58] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[01/31/2024-08:24:58] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
Traceback (most recent call last):
File "/workspace/example/run.py", line 532, in <module>
main(args)
File "/workspace/example/run.py", line 410, in main
outputs = runner.generate(
File "/workspace/TensorRT-LLM/tensorrt_llm/runtime/model_runner.py", line 598, in generate
outputs = self.session.decode(
File "/workspace/TensorRT-LLM/tensorrt_llm/runtime/generation.py", line 755, in wrapper
ret = func(self, *args, **kwargs)
File "/workspace/TensorRT-LLM/tensorrt_llm/runtime/generation.py", line 2891, in decode
return self.decode_regular(
File "/workspace/TensorRT-LLM/tensorrt_llm/runtime/generation.py", line 2548, in decode_regular
should_stop, next_step_tensors, tasks, context_lengths, host_context_lengths, attention_mask, logits, encoder_input_lengths = self.handle_per_step(
File "/workspace/TensorRT-LLM/tensorrt_llm/runtime/generation.py", line 2231, in handle_per_step
raise RuntimeError(f"Executing TRT engine failed step={step}!")
RuntimeError: Executing TRT engine failed step=0!
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[18601,1],0]
Exit code: 1
--------------------------------------------------------------------------
additional notes
I use TensorRT-LLM main branch
Could you please share the build and run command?
I am also getting this error, more details in this issue - https://github.com/NVIDIA/trt-llm-rag-windows/issues/38