Llama-3-8B
I have obtained the weight offset factor for llama3-8b, but there was a unique mismatch issue during my compression process.
My scaling factor code has not been changed, but there was a dimension issue when I started compressing.
The parameter settings are as follows:
--model
${}$Llama-3-8b/
--epochs
20
--output_dir
${}$llama-3-8b-w6a6/
--eval_ppl
--wbits
6
--abits
6
--lwc
--let
--net
Llama-3-8b
--tasks
arc_easy,arc_challenge,boolq,hellaswag,winogrande,piqa
w=16,a=16
I can obtain the uncompressed values of w=16 and a=16. But once the compression value is set(w=6,a=6), problems arise
@hsb1995 LLaMA-3-8B uses GQA (Group Query Attention), which is not supported by current ‘let’.
@hsb1995 LLaMA-3-8B uses GQA (Group Query Attention), which is not supported by current ‘let’.
Professor, thank you for your full work. I really don't know how GQA is handled as you mentioned
Can I understand what you said that I kept the original "generateAct_scale.shift" file unchanged to obtain the "act_scales" and "act_shifts" files.
And then I will do our weight quantification for processing?
Parameter settings:
CUDA_VISIBLE_DEVICES=0 python main.py
--model /PATH/TO/LLaMA/llama-8b
--epochs 20 --output_dir ./log/llama-8b-w6a6
--eval_ppl --wbits 6 --abits 6 --lwc
Is the above operation possible?
I only deleted the let operation.
Hey, professor. I gave it a try. It's really difficult to change. The current errors are as follows. What should I do when encountering these?
[2024-04-24 17:14:17 root](omniquant.py 50): INFO Starting ...
Some weights of LlamaForCausalLM were not initialized from the model checkpoint at /home/sam/Doctorproject/weight/llama-3-8b/LLM-Research/Llama-3-8b/ and are newly initialized: ['model.layers.17.self_attn.rotary_emb.inv_freq', 'model.layers.1.self_attn.rotary_emb.inv_freq', 'model.layers.3.self_attn.rotary_emb.inv_freq', 'model.layers.4.self_attn.rotary_emb.inv_freq', 'model.layers.16.self_attn.rotary_emb.inv_freq', 'model.layers.31.self_attn.rotary_emb.inv_freq', 'model.layers.21.self_attn.rotary_emb.inv_freq', 'model.layers.10.self_attn.rotary_emb.inv_freq', 'model.layers.24.self_attn.rotary_emb.inv_freq', 'model.layers.28.self_attn.rotary_emb.inv_freq', 'model.layers.11.self_attn.rotary_emb.inv_freq', 'model.layers.13.self_attn.rotary_emb.inv_freq', 'model.layers.14.self_attn.rotary_emb.inv_freq', 'model.layers.15.self_attn.rotary_emb.inv_freq', 'model.layers.2.self_attn.rotary_emb.inv_freq', 'model.layers.20.self_attn.rotary_emb.inv_freq', 'model.layers.27.self_attn.rotary_emb.inv_freq', 'model.layers.0.self_attn.rotary_emb.inv_freq', 'model.layers.7.self_attn.rotary_emb.inv_freq', 'model.layers.6.self_attn.rotary_emb.inv_freq', 'model.layers.9.self_attn.rotary_emb.inv_freq', 'model.layers.29.self_attn.rotary_emb.inv_freq', 'model.layers.26.self_attn.rotary_emb.inv_freq', 'model.layers.22.self_attn.rotary_emb.inv_freq', 'model.layers.19.self_attn.rotary_emb.inv_freq', 'model.layers.12.self_attn.rotary_emb.inv_freq', 'model.layers.8.self_attn.rotary_emb.inv_freq', 'model.layers.30.self_attn.rotary_emb.inv_freq', 'model.layers.25.self_attn.rotary_emb.inv_freq', 'model.layers.5.self_attn.rotary_emb.inv_freq', 'model.layers.18.self_attn.rotary_emb.inv_freq', 'model.layers.23.self_attn.rotary_emb.inv_freq']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Traceback (most recent call last):
File "/home/sam/Doctorproject/OmniQuant-main/main.py", line 419, in
@ChenMnZ hello,I also meet some problems like this.
I've tried your code in runing_falcon180b_on_single_a100_80g.ipynb with llama2-7b. Do quant and do save with real quant.However,while Loading pre-computed quantized weights,it returns warning like this,
and fail while exec code model = model.cuda().
bug like this
I also try your weight in huggingface,but seems it does not work.