OmniQuant Llama-3-8B

I have obtained the weight offset factor for llama3-8b, but there was a unique mismatch issue during my compression process.

My scaling factor code has not been changed, but there was a dimension issue when I started compressing. The parameter settings are as follows: --model ${}$Llama-3-8b/ --epochs 20 --output_dir ${}$llama-3-8b-w6a6/ --eval_ppl --wbits 6 --abits 6 --lwc --let --net Llama-3-8b --tasks arc_easy,arc_challenge,boolq,hellaswag,winogrande,piqa

Apr 22 '24 14:04 hsb1995

w=16,a=16 I can obtain the uncompressed values of w=16 and a=16. But once the compression value is set（w=6,a=6）, problems arise

Apr 22 '24 14:04 hsb1995

@hsb1995 LLaMA-3-8B uses GQA (Group Query Attention), which is not supported by current ‘let’.

Apr 24 '24 01:04 ChenMnZ

@hsb1995 LLaMA-3-8B uses GQA (Group Query Attention), which is not supported by current ‘let’.

Professor, thank you for your full work. I really don't know how GQA is handled as you mentioned

Can I understand what you said that I kept the original "generateAct_scale.shift" file unchanged to obtain the "act_scales" and "act_shifts" files. And then I will do our weight quantification for processing? Parameter settings: CUDA_VISIBLE_DEVICES=0 python main.py
--model /PATH/TO/LLaMA/llama-8b
--epochs 20 --output_dir ./log/llama-8b-w6a6
--eval_ppl --wbits 6 --abits 6 --lwc Is the above operation possible? I only deleted the let operation.

Apr 24 '24 07:04 hsb1995

Hey, professor. I gave it a try. It's really difficult to change. The current errors are as follows. What should I do when encountering these?

[2024-04-24 17:14:17 root](omniquant.py 50): INFO Starting ... Some weights of LlamaForCausalLM were not initialized from the model checkpoint at /home/sam/Doctorproject/weight/llama-3-8b/LLM-Research/Llama-3-8b/ and are newly initialized: ['model.layers.17.self_attn.rotary_emb.inv_freq', 'model.layers.1.self_attn.rotary_emb.inv_freq', 'model.layers.3.self_attn.rotary_emb.inv_freq', 'model.layers.4.self_attn.rotary_emb.inv_freq', 'model.layers.16.self_attn.rotary_emb.inv_freq', 'model.layers.31.self_attn.rotary_emb.inv_freq', 'model.layers.21.self_attn.rotary_emb.inv_freq', 'model.layers.10.self_attn.rotary_emb.inv_freq', 'model.layers.24.self_attn.rotary_emb.inv_freq', 'model.layers.28.self_attn.rotary_emb.inv_freq', 'model.layers.11.self_attn.rotary_emb.inv_freq', 'model.layers.13.self_attn.rotary_emb.inv_freq', 'model.layers.14.self_attn.rotary_emb.inv_freq', 'model.layers.15.self_attn.rotary_emb.inv_freq', 'model.layers.2.self_attn.rotary_emb.inv_freq', 'model.layers.20.self_attn.rotary_emb.inv_freq', 'model.layers.27.self_attn.rotary_emb.inv_freq', 'model.layers.0.self_attn.rotary_emb.inv_freq', 'model.layers.7.self_attn.rotary_emb.inv_freq', 'model.layers.6.self_attn.rotary_emb.inv_freq', 'model.layers.9.self_attn.rotary_emb.inv_freq', 'model.layers.29.self_attn.rotary_emb.inv_freq', 'model.layers.26.self_attn.rotary_emb.inv_freq', 'model.layers.22.self_attn.rotary_emb.inv_freq', 'model.layers.19.self_attn.rotary_emb.inv_freq', 'model.layers.12.self_attn.rotary_emb.inv_freq', 'model.layers.8.self_attn.rotary_emb.inv_freq', 'model.layers.30.self_attn.rotary_emb.inv_freq', 'model.layers.25.self_attn.rotary_emb.inv_freq', 'model.layers.5.self_attn.rotary_emb.inv_freq', 'model.layers.18.self_attn.rotary_emb.inv_freq', 'model.layers.23.self_attn.rotary_emb.inv_freq'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Traceback (most recent call last): File "/home/sam/Doctorproject/OmniQuant-main/main.py", line 419, in main() File "/home/sam/Doctorproject/OmniQuant-main/main.py", line 383, in main omniquant( File "/home/sam/Doctorproject/OmniQuant-main/quantize/omniquant.py", line 102, in omniquant raise ValueError("Only support for opt/llama/Llama-2/Llama-3/falcon/mixtral now") ValueError: Only support for opt/llama/Llama-2/Llama-3/falcon/mixtral now

Apr 24 '24 09:04 hsb1995

@ChenMnZ hello,I also meet some problems like this. I've tried your code in runing_falcon180b_on_single_a100_80g.ipynb with llama2-7b. Do quant and do save with real quant.However,while Loading pre-computed quantized weights,it returns warning like this, and fail while exec code model = model.cuda(). bug like this I also try your weight in huggingface,but seems it does not work.

Aug 02 '24 18:08 kimoji919