FastChat Add Support for Loading Models in 4-bit Quantized Version (Fixes #1798)

Add Support for Loading Models in 4-bit Quantized Version (Fixes #1798)

Open 02shanks opened this issue 1 year ago • 0 comments

trafficstars

Why are these changes needed?

This pull request adds support for loading models in 4-bit quantized versions. This enhancement addresses the need for more efficient model loading and storage, particularly for resource-constrained environments.

Related issue number (if applicable)

Closes #1798

Checks

[x] I've run format.sh to lint the changes in this PR.
[x] I've included any doc changes needed.
[x] I've made sure the relevant tests are passing (if applicable).

Aug 13 '24 17:08 02shanks

FastChat FastChat copied to clipboard

Add Support for Loading Models in 4-bit Quantized Version (Fixes #1798)

Why are these changes needed?

Related issue number (if applicable)

Checks

FastChat
FastChat copied to clipboard