TensorRT
TensorRT copied to clipboard
Feat: Pre-quantized LLM model support
Description
Support pre-quantized HF models and post-training quantization (PTQ) option for run_llm.py
Fixes # (issue)
Type of change
- New feature (non-breaking change which adds functionality)
Checklist:
- [x] My code follows the style guidelines of this project (You can use the linters)
- [x] I have performed a self-review of my own code
- [x] I have commented my code, particularly in hard-to-understand areas and hacks
- [ ] I have made corresponding changes to the documentation
- [ ] I have added tests to verify my fix or my feature
- [ ] New and existing unit tests pass locally with my changes
- [ ] I have added the relevant labels to my PR in so that relevant reviewers are notified
modelopt has changed their code structure in 0.35.0: please make the same changes as here: https://github.com/pytorch/TensorRT/commit/9c520f8a78303f02c551437fc2b5d03093934790