TensorRT-LLM
TensorRT-LLM copied to clipboard
ChatGLMHeadModel/layers/0/pre_norm/PLUGIN_V2_Rmsnorm_0: could not find any supported formats consistent with input/output data types
System Info
GPU V100 cuda-11.4
Who can help?
No response
Information
- [X] The official example scripts
- [ ] My own modified scripts
Tasks
- [X] An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
python build.py --model_dir ./chatglm2-6b --output_dir ./chatglm2-6b.trt.f32 --dtype float32
Expected behavior
success
actual behavior
[01/17/2024-11:12:06] [TRT] [I] BuilderFlag::kTF32 is set but hardware does not support TF32. Disabling TF32.
[01/17/2024-11:12:06] [TRT] [W] Unused Input: position_ids
[01/17/2024-11:12:06] [TRT] [W] [RemoveDeadLayers] Input Tensor position_ids is unused or used only at compile-time, but is not being removed.
[01/17/2024-11:12:06] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 25007, GPU 622 (MiB)
[01/17/2024-11:12:07] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +273, GPU +170, now: CPU 25280, GPU 792 (MiB)
[01/17/2024-11:12:07] [TRT] [W] TensorRT was linked against cuDNN 8.9.6 but loaded cuDNN 8.2.4
[01/17/2024-11:12:07] [TRT] [I] BuilderFlag::kTF32 is set but hardware does not support TF32. Disabling TF32.
[01/17/2024-11:12:07] [TRT] [I] Global timing cache in use. Profiling results in this builder pass will be stored.
[01/17/2024-11:12:07] [TRT] [E] 9: ChatGLMHeadModel/layers/0/pre_norm/PLUGIN_V2_Rmsnorm_0: could not find any supported formats consistent with input/output data types
[01/17/2024-11:12:07] [TRT] [E] 9: [pluginV2Builder.cpp::reportPluginError::24] Error Code 9: Internal Error (ChatGLMHeadModel/layers/0/pre_norm/PLUGIN_V2_Rmsnorm_0: could not find any supported formats consistent with input/output data types)
[01/17/2024-11:12:07] [TRT-LLM] [E] Engine building failed, please check the error log.
[01/17/2024-11:12:07] [TRT-LLM] [I] Config saved to ./chatglm2-6b.trt.f32/config.json.
Traceback (most recent call last):
File "/root/tensort-llm/TensorRT-LLM/examples/chatglm/build.py", line 910, in
additional notes
if I do not use '--dtype float32', it will success. But I really need float32. Thanks!
Thank you for coming up with this usage!
In fact, all plugins (rmsnorm/layernorm, gemm, apt_attention) are set in FP16 in default even though building with --dtype=float32
.
If you need FP32 engine, these arguments needs to be added: --dtype=float32 --use_rmsnorm_plugin=float32 --use_gemm_plugin=float32 --use_gpt_attention_plugin=float32
.
We will add related usage and illustration in the README.md.