GPTQ-for-LLaMa How to quantize bloom after lora/ptuning?

How to quantize bloom after lora/ptuning?

Open moonlightian opened this issue 2 years ago • 0 comments

I finetuned bloom with loar and would like to quantize the model with GPTQ, self.model = AutoModelForCausalLM.from_pretrained( self.config['checkpoint_path'], device_map='auto', ) #load adpater self.model = PeftModelForCausalLM.from_pretrained(self.model, '/tmp/bloom_ori/lora_bloom') some errors happened like: It seems that after loading adapter, there are dimension error between alibi and attention_mask. How could I get rid of these bugs and quantize model with adapter?

Jun 05 '23 06:06 moonlightian

GPTQ-for-LLaMa GPTQ-for-LLaMa copied to clipboard

How to quantize bloom after lora/ptuning?

GPTQ-for-LLaMa
GPTQ-for-LLaMa copied to clipboard