TensorRT-LLM
TensorRT-LLM copied to clipboard
Cannot build Nougat model
System Info
- RTX 4090
- x86_64 GNU/Linux
- main branch
Who can help?
No response
Information
- [X] The official example scripts
- [ ] My own modified scripts
Tasks
- [ ] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
Following the instructions for Nougat here: https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/multimodal#nougat
Error happens during the build.py step.
python ../enc_dec/build.py \
--model_type bart \
--weight_dir tmp/trt_models/${MODEL_NAME}/tp1 \
-o trt_engines/${MODEL_NAME}/1-gpu \
--engine_name $MODEL_NAME \
--bert_attention_plugin \
--use_gpt_attention_plugin \
--use_gemm_plugin \
--dtype bfloat16 \
--max_beam_width 1 \
--max_batch_size 1 \
--nougat \
--max_output_len 100 \
--max_multimodal_len 588
Expected behavior
Model successfully builds
actual behavior
[02/15/2024-19:53:13] [TRT-LLM] [W] Skipping build of encoder for Nougat model
[02/15/2024-19:53:13] [TRT-LLM] [I] Setting model configuration from tmp/trt_models/nougat-small/tp1.
[02/15/2024-19:53:13] [TRT-LLM] [I] use_bert_attention_plugin set, without specifying a value. Using bfloat16 automatically.
[02/15/2024-19:53:13] [TRT-LLM] [I] use_gpt_attention_plugin set, without specifying a value. Using bfloat16 automatically.
[02/15/2024-19:53:13] [TRT-LLM] [I] use_gemm_plugin set, without specifying a value. Using bfloat16 automatically.
[02/15/2024-19:53:13] [TRT-LLM] [W] Forcing max_encoder_input_len equal to max_prompt_embedding_table_size
[02/15/2024-19:53:13] [TRT-LLM] [I] Serially build TensorRT engines.
[02/15/2024-19:53:13] [TRT] [I] [MemUsageChange] Init CUDA: CPU +13, GPU +0, now: CPU 121, GPU 404 (MiB)
[02/15/2024-19:53:14] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1809, GPU +316, now: CPU 2066, GPU 720 (MiB)
[02/15/2024-19:53:14] [TRT-LLM] [W] Invalid timing cache, using freshly created one
[02/15/2024-19:53:14] [TRT-LLM] [I] Loading weights from binary...
[02/15/2024-19:53:14] [TRT-LLM] [I] Weights loaded. Total time: 00:00:00
Traceback (most recent call last):
File "/home/mark/projects/searchresearch/TensorRT-LLM/examples/multimodal/../enc_dec/build.py", line 574, in <module>
run_build(component='decoder')
File "/home/mark/projects/searchresearch/TensorRT-LLM/examples/multimodal/../enc_dec/build.py", line 565, in run_build
build(0, args)
File "/home/mark/projects/searchresearch/TensorRT-LLM/examples/multimodal/../enc_dec/build.py", line 509, in build
engine = build_rank_engine(builder, builder_config, engine_name,
File "/home/mark/projects/searchresearch/TensorRT-LLM/examples/multimodal/../enc_dec/build.py", line 402, in build_rank_engine
network.plugin_config.to_legacy_setting()
AttributeError: 'PluginConfig' object has no attribute 'to_legacy_setting'
additional notes
It looks like the to_legacy_settings() method doesn't exist in the builder class.