TensorRT-LLM
TensorRT-LLM copied to clipboard
refactor:[AutoDeploy] Enhance RoPE support
- [X] tests for flashinfer rope op mapping to current rope implementation
- [ ] pattern matching rope in graph and map to flashinfer op
- [ ] late fusion of attention and rope and compares perf