TensorRT-LLM
TensorRT-LLM copied to clipboard
Missing kernels for sm_87 (Jetson Orin AGX)
System Info
Jetson Orin AGX, using the version 0.10 from pip
Who can help?
No response
Information
- [X] The official example scripts
- [ ] My own modified scripts
Tasks
- [X] An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
Run official examples for Llama.
Expected behavior
Use optimized fused MHA kernels.
actual behavior
[TensorRT-LLM][WARNING] Fall back to unfused MHA because of unsupported head size 128 in sm_87.
additional notes
Version 0.10 is available in pip for Jetson Orin AGX. However, it's not very useful because the compiled kernels are missing for sm_87.
TensorRT-LLM does not have the sm 87 fused mha kernels now. If you are interested, we can change this issue to feature request.
Close it now and you may reopen it as a feature request.