TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

Building T5 with `--debug_mode` flag causes it to not run successfully

Open varyn-woo opened this issue 1 year ago • 1 comments

System Info

CPU: x86_64 OS: Linux Ubuntu GPU: Nvidia A100 and A10G (through Latitude.sh and AWS EC2, respectively) TensorRT-LLM version: 0.9.0

Who can help?

No response

Information

  • [X] The official example scripts
  • [ ] My own modified scripts

Tasks

  • [X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [ ] My own task or dataset (give details below)

Reproduction

When I convert and build a T5 model (I tried t5-small and t5-base from HuggingFace for my attempt) with no --remove_input_padding and the --debug_mode build flag enabled (all other params equal to the example command for a single GPU in the example README) on an NVIDIA A100 or A10G GPU, encoder_run fails with the following error:

[04/12/2024-21:13:17] [TRT] [E] 3: [executionContext.cpp::enqueueV3::2650] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::enqueueV3::2650, condition: mContext.profileObliviousBindings.at(profileObliviousIndex) || getPtrOrNull(mOutputAllocators, profileObliviousIndex) )

The error does not occur when running on an NVIDIA H100, so seems localized to the sm86 architecture (potentially others as well, but I've only tried on sm90 and sm86).

Expected behavior

Successful run of TensorRT-LLM version of a T5 model with layer-by-layer outputs printed out for debug purposes.

actual behavior

Runtime execution failed assert failure in encoder_run, preceded by Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::enqueueV3::2650, condition: mContext.profileObliviousBindings.at(profileObliviousIndex) || getPtrOrNull(mOutputAllocators, profileObliviousIndex) ) during the execution of self.encoder_session.run(inputs, outputs, self.stream.cuda_stream)

additional notes

I tried googling the error, but results were generally unhelpful. This could be an issue stemming from TensorRT and not the TensorRT-LLM wrapper.

varyn-woo avatar Apr 16 '24 20:04 varyn-woo

@symphonylyh any updates?

poweiw avatar May 16 '25 21:05 poweiw

@varyn-woo , Apologies for the very delayed response. Is this ticket still relevant? If so, could you try the latest version to see if the issue persists?

karljang avatar Oct 21 '25 05:10 karljang

Issue has not received an update in over 14 days. Adding stale label.

github-actions[bot] avatar Nov 05 '25 03:11 github-actions[bot]

Closing this issue as stale. If the problem persists in the latest release, please feel free to open a new one. Thank you!

karljang avatar Nov 14 '25 18:11 karljang