Feat: Pre-quantized LLM model support

Open keehyuna opened this issue 5 months ago • 1 comments

Description

Support pre-quantized HF models and post-training quantization (PTQ) option for run_llm.py

Fixes # (issue)

Type of change

New feature (non-breaking change which adds functionality)

Checklist:

[x] My code follows the style guidelines of this project (You can use the linters)
[x] I have performed a self-review of my own code
[x] I have commented my code, particularly in hard-to-understand areas and hacks
[ ] I have made corresponding changes to the documentation
[ ] I have added tests to verify my fix or my feature
[ ] New and existing unit tests pass locally with my changes
[ ] I have added the relevant labels to my PR in so that relevant reviewers are notified

Aug 01 '25 00:08 keehyuna

modelopt has changed their code structure in 0.35.0: please make the same changes as here: https://github.com/pytorch/TensorRT/commit/9c520f8a78303f02c551437fc2b5d03093934790

Sep 19 '25 16:09 lanluo-nvidia