feat: Refactor LLM model zoo and add KV cache support

Open peri044 opened this issue 6 months ago • 0 comments

Description

This PR redesigns our LLM model compilation, unifies it, fixes output mismatch and performance issues. This PR also implements KV caching using native TensorRT.

Fixes # (issue)

Type of change

New feature (non-breaking change which adds functionality)
This change requires a documentation update

Checklist:

[x] My code follows the style guidelines of this project (You can use the linters)
[x] I have performed a self-review of my own code
[x] I have commented my code, particularly in hard-to-understand areas and hacks
[x] I have made corresponding changes to the documentation
[x] I have added tests to verify my fix or my feature
[x] New and existing unit tests pass locally with my changes
[x] I have added the relevant labels to my PR in so that relevant reviewers are notified

May 20 '25 23:05 peri044

TensorRT TensorRT copied to clipboard

feat: Refactor LLM model zoo and add KV cache support

Description

Type of change

Checklist:

TensorRT
TensorRT copied to clipboard