axolotl icon indicating copy to clipboard operation
axolotl copied to clipboard

Doubts about merging

Open vip-china opened this issue 1 year ago • 1 comments

Please check that this issue hasn't been reported before.

  • [X] I searched previous Bug Reports didn't find any similar reports.

Expected Behavior

After the DPO training is completed, the models are merged and it is found that the merged models become smaller. Is there a quantitative operation during the merging process?

Moreover, the merged model is not available and will result in garbled code during inference testing

Current behaviour

  1. Perform DPO training on mixtral

  2. Merge the trained models

  3. Inference testing

The base model used by DPO:87G bce79d1e0984749891346f3a6f7a807

The merged model after training:14G 1b0ece9c75d04fe51a8f29932c27b88

The phenomenon of reasoning: fb88334817d4641915b4229eba6b56d

Steps to reproduce

1)dpo.yml:

base_model: /data1/ljf2/data/Nous-Hermes-2-Mixtral-8x7B-SFT-0204-new
model_type: MistralForCausalLM
tokenizer_type: LlamaTokenizer
is_mistral_derived_model: true

load_in_8bit: false
load_in_4bit: true
strict: false

rl: dpo
datasets:
  - path: /data1/ljf2/data/ultrafeedback-binarized-preferences-cleaned
    split: train
    type: chatml.ultra
  - path: /data1/ljf2/data/final-v2
    split: train
    type: chatml.ultra
dataset_prepared_path: last_data_out
val_set_size: 0.01
output_dir: /data1/ljf2/data/dpo_out

adapter: qlora
lora_model_dir:

sequence_len: 4096
pad_to_sequence_len: true

lora_r: 64
lora_alpha: 32
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
lora_target_modules:
  - gate_proj
  - down_proj
  - up_proj
  - q_proj
  - v_proj
  - k_proj
  - o_proj
gradient_accumulation_steps: 1
micro_batch_size: 1
num_epochs: 1
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 2e-5

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
tf32: false

gradient_checkpointing: true
early_stopping_patience:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

evals_per_epoch: 1
eval_table_size:
eval_table_max_new_tokens: 128

warmup_steps: 10
save_steps: 500
save_total_limit: 3
debug:
deepspeed: deepspeed_configs/zero2.json
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
save_safetensors: true

2)merge: python3 -m axolotl.cli.merge_lora dpo.yml --lora_model_dir="/data1/ljf2/data/dpo_out/mixtral-dpo-0207-lora" --output_dir=/data1/ljf2/data

Config yaml

No response

Possible solution

No response

Which Operating Systems are you using?

  • [X] Linux
  • [ ] macOS
  • [ ] Windows

Python Version

3.10

axolotl branch-commit

main

Acknowledgements

  • [X] My issue title is concise, descriptive, and in title casing.
  • [X] I have searched the existing issues to make sure this bug has not been reported yet.
  • [X] I am using the latest version of axolotl.
  • [X] I have provided enough information for the maintainers to reproduce and diagnose the issue.

vip-china avatar Feb 07 '24 03:02 vip-china

Could it be a quantized model? Looking at the config file, there is a quantization_config section.

  "quantization_config": {
    "_load_in_4bit": true,
    "_load_in_8bit": false,
    "bnb_4bit_compute_dtype": "bfloat16",
    "bnb_4bit_quant_type": "nf4",
    "bnb_4bit_use_double_quant": true,
    "llm_int8_enable_fp32_cpu_offload": false,
    "llm_int8_has_fp16_weight": false,
    "llm_int8_skip_modules": null,
    "llm_int8_threshold": 6.0,
    "load_in_4bit": true,
    "load_in_8bit": false,
    "quant_method": "bitsandbytes"
  },

My trick was using save_steps and manually merging the last checkpoint to the base. And OP, please update the title of this issue since the current one is unclear.

seungduk-yanolja avatar Feb 13 '24 13:02 seungduk-yanolja