TensorRT-LLM

TensorRT-LLM copied to clipboard

Reame
Issues

feat:[AutoDeploy] Add support for Phi3/4 Model Family

Open Fridah-nv opened this issue 9 months ago • 0 comments

Overview

new transformation to unfuse qkv_gemm and gate_up_proj to standardize exported graph and apply later transformations
Patch Phi-3 Model init issue with HF AutoModelForCausalLM

Testing

unit test for unfuse_weights transformation
integration tests for Phi models

TODO: Test perf difference

Future Work:

Support Phi3LongRoPEScaledRotaryEmbedding for Phi-3.5 and other long context models

Support Matrix

Model	compile_backend / runtime / atten_backend	world_size
microsoft/phi-4	{torch-simple,torch-opt}, {trtllm,demollm}, {TritonWithFlattenedInputs,FlashInfer}	1,2,4
microsoft/Phi-3-mini-4k-instruct	{torch-simple,torch-opt}, {trtllm,demollm}, {TritonWithFlattenedInputs}	1,2,4
microsoft/Phi-3-medium-4k-instruct	{torch-simple,torch-opt}, {trtllm,demollm}, {TritonWithFlattenedInputs}	1,2,4

Mar 25 '25 21:03 Fridah-nv