TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

feat:[AutoDeploy] Add support for Phi3/4 Model Family

Open Fridah-nv opened this issue 9 months ago • 0 comments

Overview

  1. new transformation to unfuse qkv_gemm and gate_up_proj to standardize exported graph and apply later transformations
  2. Patch Phi-3 Model init issue with HF AutoModelForCausalLM

Testing

  1. unit test for unfuse_weights transformation
  2. integration tests for Phi models

TODO: Test perf difference

Future Work:

  1. Support Phi3LongRoPEScaledRotaryEmbedding for Phi-3.5 and other long context models

Support Matrix

Model compile_backend / runtime / atten_backend world_size
microsoft/phi-4 {torch-simple,torch-opt}, {trtllm,demollm}, {TritonWithFlattenedInputs,FlashInfer} 1,2,4
microsoft/Phi-3-mini-4k-instruct {torch-simple,torch-opt}, {trtllm,demollm}, {TritonWithFlattenedInputs} 1,2,4
microsoft/Phi-3-medium-4k-instruct {torch-simple,torch-opt}, {trtllm,demollm}, {TritonWithFlattenedInputs} 1,2,4

Fridah-nv avatar Mar 25 '25 21:03 Fridah-nv