FastChat
FastChat copied to clipboard
Add Support for Loading Models in 4-bit Quantized Version (Fixes #1798)
trafficstars
Why are these changes needed?
This pull request adds support for loading models in 4-bit quantized versions. This enhancement addresses the need for more efficient model loading and storage, particularly for resource-constrained environments.
Related issue number (if applicable)
Closes #1798
Checks
- [x] I've run
format.shto lint the changes in this PR. - [x] I've included any doc changes needed.
- [x] I've made sure the relevant tests are passing (if applicable).