TensorRT icon indicating copy to clipboard operation
TensorRT copied to clipboard

feat: Refactor LLM model zoo and add KV cache support

Open peri044 opened this issue 6 months ago • 0 comments

Description

This PR redesigns our LLM model compilation, unifies it, fixes output mismatch and performance issues. This PR also implements KV caching using native TensorRT.

Fixes # (issue)

Type of change

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

Checklist:

  • [x] My code follows the style guidelines of this project (You can use the linters)
  • [x] I have performed a self-review of my own code
  • [x] I have commented my code, particularly in hard-to-understand areas and hacks
  • [x] I have made corresponding changes to the documentation
  • [x] I have added tests to verify my fix or my feature
  • [x] New and existing unit tests pass locally with my changes
  • [x] I have added the relevant labels to my PR in so that relevant reviewers are notified

peri044 avatar May 20 '25 23:05 peri044