Phi-3CookBook icon indicating copy to clipboard operation
Phi-3CookBook copied to clipboard

Lora fine tuning phi-3.5 moe

Open Manikanta5112 opened this issue 1 year ago • 7 comments
trafficstars

Hi,

I recently fine-tuned the phi-3.5-moe-instruct model and phi-3.5-mini-instruct model using PEFT LORA. It seems the Moe model is performing way worse than 3.5 Mini Are there any specific things that need to be in mind during LORA fine-tuning with a mixture of expert models? And also during fine-tuning for Moe the validation loss is showing as No Log

Manikanta5112 avatar Sep 08 '24 20:09 Manikanta5112

Can you show me training result and hyperparameters? Did you fine tune using UNSLOTH?

sujankarki269 avatar Sep 11 '24 17:09 sujankarki269

sorry, there is no unsloth support for phi-3.5-moe-instruct model.

Training loss is keep on decreasing, but for validation loss it always showing as No Log

However, below are the hyper parameters:

"base_model": "microsoft/Phi-3.5-MoE-instruct", "max_seq_length": 4096,

"lora_config":
{
    "rank": 32,
    "alpha": 32,
    "task_type": "CAUSAL_LM",
    "target_modules": ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj",], #"all-linear",
    "rslora": True,
    "bias": "lora_only"
},

"quantization_config":
{
    "load_in_4bit": False,
    "bnb_4bit_quant_type": "nf4",
    "bnb_4bit_use_double_quant": True
},

"training_arguments":
{
    "logging_dir": f"logs/{get_iter_name(__file__)}/",
    "output_dir":f"models/{get_iter_name(__file__)}/",
    "evaluation_strategy": "steps",
    "save_strategy": "steps",
    "logging_strategy": "steps",
    "learning_rate": 2e-4, 
    "weight_decay": 0.1,
    "logging_steps": 5,
    "eval_steps": 10,
    "save_steps": 10,
    "eval_delay": 100,
    "warmup_steps": 0,
    "save_total_limit": 5,
    
    "optim": "adamw_torch_fused",
    "per_device_train_batch_size": 2*2*2*2,
    "per_device_eval_batch_size": 2*2*2*2,
    "gradient_accumulation_steps": 4*2*2,
    "eval_accumulation_steps": 4*2*2,
    "gradient_checkpointing": True,

    "adam_beta1": 0.9,
    "adam_beta2": 0.95,
    "adam_epsilon": 1e-8,
    "max_grad_norm": 1.0,
    "lr_scheduler_type": 'cosine',
    "num_train_epochs": 1,
    "continue_from_checkpoint": True,

    "fp16": False,
    "fp16_full_eval": False,

    "bf16": True,
    "bf16_full_eval": True,
},

Manikanta5112 avatar Sep 11 '24 18:09 Manikanta5112

phi-3.5-mini-instruct model have o_proj and qkv_proj so why are you adding ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj",] in targeted modules? see the layers in below image and see what are you targetting. It too doesnot have "gate_proj", "up_proj" , it has gate_up_proj.

image-6

and i too fine tuned phi-3.5-mini-instruct model. it is giving validation loss results tooo... image-7

sujankarki269 avatar Sep 12 '24 11:09 sujankarki269

I am not using phi-3.5-mini I am using phi-3.5-moe-instruct model

Manikanta5112 avatar Sep 12 '24 11:09 Manikanta5112

phi-3.5-mini-instruct model have o_proj and qkv_proj so why are you adding ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj",] in targeted modules? see the layers in below image and see what are you targetting. It too doesnot have "gate_proj", "up_proj" , it has gate_up_proj.

image-6

and i too fine tuned phi-3.5-mini-instruct model. it is giving validation loss results tooo... image-7

https://github.com/microsoft/Phi-3CookBook/blob/main/code/04.Finetuning/Phi-3-finetune-qlora-python.ipynb

The example in the example scripts in this cookbook appears to be misleading, as it shows

target_modules = ['k_proj', 'q_proj', 'v_proj', 'o_proj', 'gate_proj', 'down_proj', 'up_proj'],

which is incorrect based on the model's architecture.

sofyc avatar Sep 12 '24 22:09 sofyc

@sofyc I think "target_modules": [''k_proj', 'q_proj', 'v_proj', 'o_proj'], it is okay

kinfey avatar Sep 17 '24 07:09 kinfey

image when i fine tune like this only o_proj is adjusted by lora image

this is because there is a single qkv_proj layer.

sujankarki269 avatar Sep 17 '24 07:09 sujankarki269