TensorRT-LLM
TensorRT-LLM copied to clipboard
feat:[AutoDeploy] Add support for Phi3/4 Model Family
Overview
- new transformation to unfuse
qkv_gemmandgate_up_projto standardize exported graph and apply later transformations - Patch Phi-3 Model init issue with HF AutoModelForCausalLM
Testing
- unit test for
unfuse_weightstransformation - integration tests for Phi models
TODO: Test perf difference
Future Work:
- Support
Phi3LongRoPEScaledRotaryEmbeddingfor Phi-3.5 and other long context models
Support Matrix
| Model | compile_backend / runtime / atten_backend | world_size |
|---|---|---|
| microsoft/phi-4 | {torch-simple,torch-opt}, {trtllm,demollm}, {TritonWithFlattenedInputs,FlashInfer} | 1,2,4 |
| microsoft/Phi-3-mini-4k-instruct | {torch-simple,torch-opt}, {trtllm,demollm}, {TritonWithFlattenedInputs} | 1,2,4 |
| microsoft/Phi-3-medium-4k-instruct | {torch-simple,torch-opt}, {trtllm,demollm}, {TritonWithFlattenedInputs} | 1,2,4 |