peft icon indicating copy to clipboard operation
peft copied to clipboard

Can peft support ColumnParallelLinear?

Open wjn1996 opened this issue 3 months ago • 1 comments

System Info

I have a model, and the architecture has xxxParallel attributes, which are used for parallel inference:

BaichuanForCausalLM(
  (model): BaiChuanModel(
    (embed_tokens): VocabParallelEmbedding()
    (layers): ModuleList(
      (0-31): 32 x BaiChuanDecoderLayer(
        (self_attn): BaiChuanAttention(
          (W_pack): ColumnParallelLinear()
          (o_proj): RowParallelLinear()
          (attn): PagedAttentionWithALiBi()
        )
        (mlp): BaiChuanMLP(
          (gate_up_proj): ColumnParallelLinear()
          (down_proj): RowParallelLinear()
          (act_fn): SiluAndMul()
        )
        (input_layernorm): RMSNorm()
        (post_attention_layernorm): RMSNorm()
      )
    )
    (norm): RMSNorm()
  )
  (lm_head): ColumnParallelLinear()
  (sampler): Sampler()
)

I want to directly load this model with peft (lora), but it throws an error:

ValueError: Target module ColumnParallelLinear() is not supported. Currently, only the following modules are supported: `torch.nn.Linear`, `torch.nn.Embedding`, `torch.nn.Conv2d`, `transformers.pytorch_utils.Conv1D`.

So, how can I implement this process without any model architecture update?

Who can help?

@pacman100 @younesbelkada @BenjaminBossan @sayakpaul

Information

  • [ ] The official example scripts
  • [X] My own modified scripts

Tasks

  • [ ] An officially supported task in the examples folder
  • [X] My own task or dataset (give details below)

Reproduction

# LLM 和 SamplingParams
# pip install vllm==0.2.1 (cuda=11.8)
from vllm import LLM, SamplingParams
from peft import PeftModel
# Function to load the PeftModel for performance optimization
def load_peft_model(model, peft_model):
    peft_model = PeftModel.from_pretrained(model, peft_model)
    return peft_model

prompts = [
    "xxx",
]

sampling_params = SamplingParams(temperature=1.0, top_p=0.9)

model_name = "baichuan2-7b-base"
origin_model_path = "xxx/pre-trained-lm/{}".format(model_name)
saved_model_path = "xxx/v2/{}/checkpoint-8000".format(model_name) # lora path
save_answer_path = "xxx/{}".format(model_name)

llm = LLM(model=origin_model_path, trust_remote_code=True)

model = llm.llm_engine.workers[0].model
model = load_peft_model(model, saved_model_path)
llm.llm_engine.workers[0].model = model


outputs = llm.generate(
    prompts, 
    sampling_params,
    # lora_request=LoRARequest("headline-lora", 1, saved_model_path)
    )


for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

Expected behavior

solve this issue.

wjn1996 avatar May 05 '24 13:05 wjn1996

So I assume you're using megatron. Did you try this:

https://huggingface.co/docs/peft/v0.10.0/en/package_reference/lora#peft.LoraConfig.megatron_config

Here is an example: https://github.com/huggingface/peft/blob/main/tests/test_lora_megatron.py

BenjaminBossan avatar May 06 '24 09:05 BenjaminBossan