Open-Assistant
Open-Assistant copied to clipboard
Add quantization/8bit model loading support for sampling_report.py
Add ---quantize to the script call to take effect
Tested using:
- --quantize --model-name OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5
- --quantize --model-name t5-small --model-type t5conditional
- --quantize --model-name OpenAssistant/stablelm-7b-sft-v7-epoch-3
Unable to test on llama models without access to the base weights (and/or only 35GB of VRAM?)
Enjoy, TP