TensorRT-LLM fix: Fix converting EXAONE when using model_weights

Converting EXAONE without setting TRTLLM_DISABLE_UNIFIED_CONVERTER=1 causes this error. I used EXAONE-3.5-32B-Instruct, 2.4B and 7.8B.

model_name = hf_model_or_dir does not contain 'exaone'

If the model is EXAONE, config.architecture is always ExaoneForCausalLM.

Error Log

2025-03-10 15:34:22,249 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend
[TensorRT-LLM] TensorRT-LLM version: 0.18.0.dev2025022500
0.18.0.dev2025022500
[03/10/2025-15:34:22] [TRT-LLM] [W] Implicitly setting LLaMAConfig.has_partial_lora_mask = False
[03/10/2025-15:34:22] [TRT-LLM] [W] Implicitly setting LLaMAConfig.tie_word_embeddings = False
5it [00:01,  3.82it/s]
Traceback (most recent call last):
  File "/app/tensorrt_llm/examples/llama/convert_checkpoint.py", line 587, in <module>
    main()
  File "/app/tensorrt_llm/examples/llama/convert_checkpoint.py", line 579, in main
    convert_and_save_hf(args)
  File "/app/tensorrt_llm/examples/llama/convert_checkpoint.py", line 520, in convert_and_save_hf
    execute(args.workers, [convert_and_save_rank] * world_size, args)
  File "/app/tensorrt_llm/examples/llama/convert_checkpoint.py", line 527, in execute
    f(args, rank)
  File "/app/tensorrt_llm/examples/llama/convert_checkpoint.py", line 502, in convert_and_save_rank
    llama = LLaMAForCausalLM.from_hugging_face(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/models/llama/model.py", line 502, in from_hugging_face
    loader.generate_tllm_weights(model, arg_dict)
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/models/model_weights_loader.py", line 396, in generate_tllm_weights
    self.load(tllm_key,
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/models/model_weights_loader.py", line 307, in load
    v = sub_module.postprocess(tllm_key, v, **postprocess_kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/layers/linear.py", line 428, in postprocess
    weights = torch.cat(weights)
              ^^^^^^^^^^^^^^^^^^
TypeError: expected Tensor as element 0 in argument 0, but got NoneType

Mar 10 '25 15:03 lkm2835

@byshiue can you help review this MR?

Thanks June

Mar 24 '25 05:03 juney-nvidia

Hi, @byshiue can you check this MR?

If "exaone" is in model_name.lower(), then --model_dir should point to a path that includes "exaone". For example: export HF_MODEL_DIR=hf_models/exaone

On the other hand, if "exaone" is in config.architecture.lower(), then --model_dir doesn't need to be considered, since the model architecture is already specified in config.json.

May 26 '25 14:05 lkm2835

The same condition is being used here too.

https://github.com/NVIDIA/TensorRT-LLM/blob/eb2d51a42990b8d0b30bc6c29fad4fd491da749f/tensorrt_llm/models/llama/convert.py#L419-L420

Jun 02 '25 12:06 lkm2835

Hi, @lkm2835, are you still facing the same issue? When I tested on ToT main, it seems like the issue is gone now.

Jul 14 '25 03:07 yechank-nvidia

fix: Fix converting EXAONE when using model_weights_loader