Open-Assistant Add quantization/8bit model loading support for sampling

Add quantization/8bit model loading support for sampling_report.py

Open toiletpapercode opened this issue 1 year ago • 0 comments

Add ---quantize to the script call to take effect

Tested using:

Unable to test on llama models without access to the base weights (and/or only 35GB of VRAM?)

Enjoy, TP

Apr 23 '23 14:04 toiletpapercode