TensorRT icon indicating copy to clipboard operation
TensorRT copied to clipboard

Feat: Pre-quantized LLM model support

Open keehyuna opened this issue 5 months ago • 1 comments

Description

Support pre-quantized HF models and post-training quantization (PTQ) option for run_llm.py

Fixes # (issue)

Type of change

  • New feature (non-breaking change which adds functionality)

Checklist:

  • [x] My code follows the style guidelines of this project (You can use the linters)
  • [x] I have performed a self-review of my own code
  • [x] I have commented my code, particularly in hard-to-understand areas and hacks
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have added tests to verify my fix or my feature
  • [ ] New and existing unit tests pass locally with my changes
  • [ ] I have added the relevant labels to my PR in so that relevant reviewers are notified

keehyuna avatar Aug 01 '25 00:08 keehyuna

modelopt has changed their code structure in 0.35.0: please make the same changes as here: https://github.com/pytorch/TensorRT/commit/9c520f8a78303f02c551437fc2b5d03093934790

lanluo-nvidia avatar Sep 19 '25 16:09 lanluo-nvidia