Open-Assistant icon indicating copy to clipboard operation
Open-Assistant copied to clipboard

Add quantization/8bit model loading support for sampling_report.py

Open toiletpapercode opened this issue 1 year ago • 0 comments

Add ---quantize to the script call to take effect

Tested using:

  1. --quantize --model-name OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5
  2. --quantize --model-name t5-small --model-type t5conditional
  3. --quantize --model-name OpenAssistant/stablelm-7b-sft-v7-epoch-3

Unable to test on llama models without access to the base weights (and/or only 35GB of VRAM?)

Enjoy, TP

toiletpapercode avatar Apr 23 '23 14:04 toiletpapercode