TensorRT-LLM ChatGLMHeadModel/layers/0/pre_norm/PLUGIN_V2_Rmsnorm_0: could not find any supported formats consistent with input/output data types

ChatGLMHeadModel/layers/0/pre_norm/PLUGIN_V2_Rmsnorm_0: could not find any supported formats consistent with input/output data types

Open huwei02 opened this issue 1 year ago • 1 comments

System Info

GPU V100 cuda-11.4

Who can help?

No response

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

python build.py --model_dir ./chatglm2-6b --output_dir ./chatglm2-6b.trt.f32 --dtype float32

Expected behavior

success

actual behavior

[01/17/2024-11:12:06] [TRT] [I] BuilderFlag::kTF32 is set but hardware does not support TF32. Disabling TF32. [01/17/2024-11:12:06] [TRT] [W] Unused Input: position_ids [01/17/2024-11:12:06] [TRT] [W] [RemoveDeadLayers] Input Tensor position_ids is unused or used only at compile-time, but is not being removed. [01/17/2024-11:12:06] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 25007, GPU 622 (MiB) [01/17/2024-11:12:07] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +273, GPU +170, now: CPU 25280, GPU 792 (MiB) [01/17/2024-11:12:07] [TRT] [W] TensorRT was linked against cuDNN 8.9.6 but loaded cuDNN 8.2.4 [01/17/2024-11:12:07] [TRT] [I] BuilderFlag::kTF32 is set but hardware does not support TF32. Disabling TF32. [01/17/2024-11:12:07] [TRT] [I] Global timing cache in use. Profiling results in this builder pass will be stored. [01/17/2024-11:12:07] [TRT] [E] 9: ChatGLMHeadModel/layers/0/pre_norm/PLUGIN_V2_Rmsnorm_0: could not find any supported formats consistent with input/output data types [01/17/2024-11:12:07] [TRT] [E] 9: [pluginV2Builder.cpp::reportPluginError::24] Error Code 9: Internal Error (ChatGLMHeadModel/layers/0/pre_norm/PLUGIN_V2_Rmsnorm_0: could not find any supported formats consistent with input/output data types) [01/17/2024-11:12:07] [TRT-LLM] [E] Engine building failed, please check the error log. [01/17/2024-11:12:07] [TRT-LLM] [I] Config saved to ./chatglm2-6b.trt.f32/config.json. Traceback (most recent call last): File "/root/tensort-llm/TensorRT-LLM/examples/chatglm/build.py", line 910, in run_build() File "/root/tensort-llm/TensorRT-LLM/examples/chatglm/build.py", line 902, in run_build build(0, args) File "/root/tensort-llm/TensorRT-LLM/examples/chatglm/build.py", line 849, in build assert engine is not None, f'Failed to build engine for rank {cur_rank}' AssertionError: Failed to build engine for rank 0

additional notes

if I do not use '--dtype float32', it will success. But I really need float32. Thanks!

Jan 17 '24 03:01 huwei02

Thank you for coming up with this usage! In fact, all plugins (rmsnorm/layernorm, gemm, apt_attention) are set in FP16 in default even though building with --dtype=float32. If you need FP32 engine, these arguments needs to be added: --dtype=float32 --use_rmsnorm_plugin=float32 --use_gemm_plugin=float32 --use_gpt_attention_plugin=float32. We will add related usage and illustration in the README.md.

Feb 05 '24 02:02 wili-65535

TensorRT-LLM TensorRT-LLM copied to clipboard

ChatGLMHeadModel/layers/0/pre_norm/PLUGIN_V2_Rmsnorm_0: could not find any supported formats consistent with input/output data types

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

TensorRT-LLM
TensorRT-LLM copied to clipboard