why "target_modules" does not recognize any parameters?
model = AutoGPTQForCausalLM.from_quantized(
model_name,
#use_triton=True,
#warmup_triton=False,
trainable=True,
inject_fused_attention=False,
inject_fused_mlp=False,
#**kwargs
)
#model.warmup_triton()
print(model)
print("model is loaded.")
model.resize_token_embeddings(32008)
peft_config = GPTQLoraConfig(
lora_alpha=lora_alpha,
lora_dropout=lora_dropout,
r=lora_r,
bias="none",
task_type="CAUSAL_LM",
target_modules = [
"q_proj",
"k_proj",
"v_proj",
"o_proj",
"gate_proj",
"up_proj",
"down_proj",
],
modules_to_save = [
"embed_tokens",
"lm_head"
]
)
model = get_gptq_peft_model(model, peft_config=peft_config, auto_find_all_linears=True, train_mode=True)
This code bloack is part of my trainer code for GPTQ quantized models' peft. Whenever i try this code, i got message like below.
ValueError: Target modules set() not found in the base model. Please check the target modules and try again.
But, when i print the base model, i can clearly recognize there eixst that module names like below.
LlamaGPTQForCausalLM(
(model): LlamaForCausalLM(
(model): LlamaModel(
(embed_tokens): Embedding(32003, 5120, padding_idx=0)
(layers): ModuleList(
(0-39): 40 x LlamaDecoderLayer(
(self_attn): LlamaAttention(
(rotary_emb): LlamaRotaryEmbedding()
(k_proj): QuantLinear()
(o_proj): QuantLinear()
(q_proj): QuantLinear()
(v_proj): QuantLinear()
)
(mlp): LlamaMLP(
(act_fn): SiLUActivation()
(down_proj): QuantLinear()
(gate_proj): QuantLinear()
(up_proj): QuantLinear()
)
(input_layernorm): LlamaRMSNorm()
(post_attention_layernorm): LlamaRMSNorm()
)
)
(norm): LlamaRMSNorm()
)
(lm_head): Linear(in_features=5120, out_features=32003, bias=False)
)
)
And, when i empty target_modules, then the error message change like below.
ValueError: Target modules [] not found in the base model. Please check the target modules and try again.
What would be the possible modification for my code? Thanks for reading this issue.
That's because auto_find_all_linears is LoRAing only quantlinear because of this guy, I have the same problem, so giving auto_find_all_linears False gives me another error.... This is a recursive function call error, but when I release the call count limit, I get another strange error. I don't know what to do, did you succeed?
https://github.com/PanQiWei/AutoGPTQ/issues/509
그거
auto_find_all_linears이놈때문에 quantlinear에만 LoRA붙혀서 그래요, 저도 동일한 문제 있는데, 그래서 auto_find_all_linears False주면 또 에러나거든요.... 재귀함수 호출에러인데 이거 호출회수 리미트 풀면 또 이상한에러 납니다.. 어떻게 해야될지 모르겠네요 혹시 성공하셨나요?#509
저는 그래서 AutoGPTQForCausalLM을 사용하지 않고 제가 Lora 훈련한 모델을 merge시킨 다음에 GPTQ weight으로 바꿔준 후에 그냥 AutoModelForCasualLM으로 불러오는 방식으로 테스트했습니다. dependency도 해결하지 않아도 되고 이 방법이 더욱 좋더라고요.
For users who do not speak korean
auto_find_all_linearsinterrupt finding module names because it only operates LoRa for quantlinear. Also, When i turnauto_find_all_linears=Falseanother errors occur. How did you solve it?
=> So, I do not use AutoGPTQForCausalLM directly. I tested this by merging my Lora-trained model, replacing it with GPTQ weights, and then just importing it into AutoModelForCasualLM.
I'm not quite sure what you're referring to, could you give me a reference or more details...? Are you saying you ran GPTQ with LoRA attached? Does this work for loading and unloading LoRA?
뭔가 이해가 잘 안가는데, 참고하신 레퍼런스나 좀 자세히 알려주실 수있으실까요..?ㅠㅠ 로라를 붙힌상태로 gptq 돌리셨단 말씀이신지요?? 이렇게 해도 로라 로드, 언로드 잘 되나요??
레퍼런스는 따로 없고 저의 목적은 Lora로 LLM을 Fine tuning하고 그 LLM의 inference 속도를올릴 수 없을까? 였는데요 실제로 이걸 달성하기 위해서 llama.cpp, vLLM 등 다양한 라이브러리들을사용하기에 앞서 quantization을 생각하고 있었습니다. 그래서 저는 1.Lora 학습, Adapter merge 후에 quantization, 2.Quantization된model에다가 Lora 학습 진행 이라는 2가지 선택지를 생각했었는데 현재 2번이 라이브러리의 문제로 잘 작동되지 않음을확인하여 1번 방법으로만 실험을 진행한상황입니다. 만약 2번째 선택지가 가능하다면34B이상의 모델도 A100-80GB안에서 LoRA가 가능하지 않을까? 하는 희망을 가지고 그런 생각을 했었는데 지금 그게 불가능해보이니 1번의 방법을 추천드렸던 것입니다. 즉, 흔히들 많이하는 SFTtrainer를 사용한LoRA training script를 이용해서 먼저 모델을 만들고GPTQ 나 AWQ 라이브러리를 이용해서 경량화 시켜서 모델의 사이즈를줄인 것일 뿐 이 issue의 task는 해결하지 못한 상황입니다.
For users who do not speak korean
I don't understand something, can you give me a reference or more details...? Are you saying that you ran gptq with lora attached?? Does this work well for loading and unloading lora??
I don't have any reference and my purpose was to fine tune LLM with Lora and increase the inference speed of that LLM, but I was thinking about quantization before using various libraries such as llama.cpp and vLLM to achieve this. So I was thinking of two options: 1. Lora training, quantization after adapter merge, and 2. Quantized model plus Lora training, but currently I'm only experimenting with option 1 because I found that option 2 doesn't work well due to library issues. If the second option is possible, wouldn't it be possible to do LoRA in A100-80GB for models above 34B? I was thinking about it with the hope that it would be possible, but since it seems impossible now, I recommended method 1. In other words, I created a model using the commonly used LoRA training script using SFTtrainer and lightened it using GPTQ or AWQ libraries to reduce the size of the model, but the task of this issue has not been solved.
I succeeded by quantizing with auto_gptq and learning QLoRA with peft lora.
I succeeded by quantizing with auto_gptq and learning QLoRA with peft lora.
Can you please mention the change you made to the GPTQLoRA code? Did you use get_peft_model() instead of using get_gptq_peft_model()?
@Shounak-D I just used a plain old QLoRA huggingface example, I'm not sure how to apply deepspeed yet, but it should train fine with ddp.
import copy
import logging
import os
import torch
from datasets import load_dataset
from peft import LoraConfig, TaskType, get_peft_model, prepare_model_for_kbit_training
from setproctitle import setproctitle
from transformers import AutoModelForCausalLM, AutoTokenizer, HfArgumentParser, Trainer
from transformers.trainer_utils import is_main_process
from arguments import DatasetsArguments, ModelArguments, MyTrainingArguments
from utils import DataCollatorForSupervisedDataset
from peft.tuners.lora import LoraLayer
IGNORE_INDEX = -100
USER = "User:"
SYSTEM = "System:"
UNUSED0 = "<|unused0|>"
UNUSED1 = "<|unused1|>"
LLM42_QC_PROMPT = "### Input Query:\n{Input}\n\n" + "### Converted Query:\n"
def main(model_args: ModelArguments, dataset_args: DatasetsArguments, training_args: MyTrainingArguments):
setproctitle("gptq-qlora")
world_size = int(os.environ.get("WORLD_SIZE", 1))
ddp = world_size != 1
if ddp:
device_map = {"": int(os.environ.get("LOCAL_RANK") or 0)}
tokenizer = AutoTokenizer.from_pretrained(model_args.model_name_or_path, model_max_length=model_args.max_length)
model = AutoModelForCausalLM.from_pretrained(
model_args.model_name_or_path,
device_map=device_map,
low_cpu_mem_usage=True,
use_cache=False,
torch_dtype="auto",
)
model.config.use_cache = False
# If you train auto_gptq model, it is necessary
model.config.use_exllama = False
if training_args.gradient_checkpointing:
model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model)
peft_config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
fan_in_fan_out=False,
inference_mode=False,
r=16,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
lora_dropout=0,
lora_alpha=16,
)
model = get_peft_model(model, peft_config)
for name, module in model.named_modules():
print(name, module)
if isinstance(module, LoraLayer):
if training_args.bf16:
module = module.to(torch.bfloat16)
if "norm" in name:
module = module.to(torch.float32)
if "lm_head" in name or "embed_tokens" in name:
if hasattr(module, "weight"):
if training_args.bf16 and module.weight.dtype == torch.float32:
module = module.to(torch.bfloat16)
dataset = load_dataset("json", data_files=dataset_args.data_path)["train"]
def preprocess(raw):
clean_text = raw["prompt"].replace("User:\n", "").replace("\nSystem:\n", "")
input_text = LLM42_QC_PROMPT.format_map({"Input": clean_text})
label_text = raw["instruction"] + tokenizer.eos_token
total_text = input_text + label_text
input_seq_token_len = len(tokenizer(input_text)["input_ids"])
tokenized_text = tokenizer(total_text, return_token_type_ids=False, return_tensors="pt")
raw["input_ids"] = tokenized_text["input_ids"][0]
raw["attention_mask"] = tokenized_text["attention_mask"][0]
labels_ids = copy.deepcopy(raw["input_ids"])
labels_ids[:input_seq_token_len] = IGNORE_INDEX
raw["labels"] = labels_ids
return raw
dataset = dataset.map(preprocess, remove_columns=dataset.column_names)
dataset = dataset.filter(lambda x: len(x["input_ids"]) < 512, batched=False)
dataset.set_format("torch")
data_collator = DataCollatorForSupervisedDataset(tokenizer=tokenizer)
if training_args.local_rank == 0:
import wandb
wandb.init(
project=training_args.wandb_project, entity=training_args.wandb_entity, name=training_args.wandb_name
)
trainer = Trainer(model=model, data_collator=data_collator, train_dataset=dataset, args=training_args)
trainer.train()
if training_args.local_rank == 0:
model.save_pretrained(training_args.output_dir)
if __name__ == "__main__":
parser = HfArgumentParser((ModelArguments, DatasetsArguments, MyTrainingArguments))
model_args, dataset_args, training_args = parser.parse_args_into_dataclasses()
logging.basicConfig(
format="%(asctime)s - %(levelname)s - %(name)s - %(message)s",
datefmt="%m/%d/%Y %H:%M:%S",
level=logging.INFO if is_main_process(training_args.local_rank) else logging.WARN,
)
main(model_args=model_args, dataset_args=dataset_args, training_args=training_args)