[Bug]: Models fail on error "a and b must have same reduction dim"

Open tcherckez-nvidia opened this issue 3 weeks ago • 0 comments

System Info

H100

Who can help?

No response

Information

[ ] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

run python3 /opt/tensorrt-llm/examples/auto_deploy/build_and_run_ad.py --model JetBrains/Mellum-4b-sft-python --args.model-factory AutoModelForCausalLM '--args.model-kwargs={}' --args.tokenizer null --args.world-size 2 --args.compile-backend torch-compile --args.attn-backend flashinfer --args.runtime trtllm --args.skip-loading-weights False --args.transforms.detect-sharding.simple-shard-only False --args.max-seq-len 512

Expected behavior

AD should build an engine and run it

actual behavior

fails with: 0: [rank1]:E1122 18:59:53.660000 1739086 torch/_subclasses/fake_tensor.py:2755] raise error_type(message_evaluated) 0: [rank1]:E1122 18:59:53.660000 1739086 torch/_subclasses/fake_tensor.py:2755] RuntimeError: a and b must have same reduction dim, but got [s44*s70, 4128] X [2064, 3072]. 0: [11/22/2025-18:59:53] [TRT-LLM] [RANK 1] [E] Failed to initialize executor on rank 1: a and b must have same reduction dim, but got [s44*s70, 4128] X [2064, 3072]. 0: 0: While executing %torch_linear_simple_6 : [num_users=1] = call_function[target=torch.ops.auto_deploy.torch_linear_simple.default](args = (%mul_14, %model_layers_0_mlp_down_proj_weight, None), kwargs = {})

additional notes

model coverage run

Before submitting a new issue...

[x] Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.

Nov 23 '25 07:11 tcherckez-nvidia