[Bug]: Models fail on error "a and b must have same reduction dim"
System Info
- H100
Who can help?
No response
Information
- [ ] The official example scripts
- [ ] My own modified scripts
Tasks
- [ ] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
run
python3 /opt/tensorrt-llm/examples/auto_deploy/build_and_run_ad.py --model JetBrains/Mellum-4b-sft-python --args.model-factory AutoModelForCausalLM '--args.model-kwargs={}' --args.tokenizer null --args.world-size 2 --args.compile-backend torch-compile --args.attn-backend flashinfer --args.runtime trtllm --args.skip-loading-weights False --args.transforms.detect-sharding.simple-shard-only False --args.max-seq-len 512
Expected behavior
AD should build an engine and run it
actual behavior
fails with:
0: [rank1]:E1122 18:59:53.660000 1739086 torch/_subclasses/fake_tensor.py:2755] raise error_type(message_evaluated) 0: [rank1]:E1122 18:59:53.660000 1739086 torch/_subclasses/fake_tensor.py:2755] RuntimeError: a and b must have same reduction dim, but got [s44*s70, 4128] X [2064, 3072]. 0: [11/22/2025-18:59:53] [TRT-LLM] [RANK 1] [E] Failed to initialize executor on rank 1: a and b must have same reduction dim, but got [s44*s70, 4128] X [2064, 3072]. 0: 0: While executing %torch_linear_simple_6 : [num_users=1] = call_function[target=torch.ops.auto_deploy.torch_linear_simple.default](args = (%mul_14, %model_layers_0_mlp_down_proj_weight, None), kwargs = {})
additional notes
model coverage run
Before submitting a new issue...
- [x] Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.