axolotl
axolotl copied to clipboard
Doubts about merging
Please check that this issue hasn't been reported before.
- [X] I searched previous Bug Reports didn't find any similar reports.
Expected Behavior
After the DPO training is completed, the models are merged and it is found that the merged models become smaller. Is there a quantitative operation during the merging process?
Moreover, the merged model is not available and will result in garbled code during inference testing
Current behaviour
-
Perform DPO training on mixtral
-
Merge the trained models
-
Inference testing
The base model used by DPO:87G
The merged model after training:14G
The phenomenon of reasoning:
Steps to reproduce
1)dpo.yml:
base_model: /data1/ljf2/data/Nous-Hermes-2-Mixtral-8x7B-SFT-0204-new
model_type: MistralForCausalLM
tokenizer_type: LlamaTokenizer
is_mistral_derived_model: true
load_in_8bit: false
load_in_4bit: true
strict: false
rl: dpo
datasets:
- path: /data1/ljf2/data/ultrafeedback-binarized-preferences-cleaned
split: train
type: chatml.ultra
- path: /data1/ljf2/data/final-v2
split: train
type: chatml.ultra
dataset_prepared_path: last_data_out
val_set_size: 0.01
output_dir: /data1/ljf2/data/dpo_out
adapter: qlora
lora_model_dir:
sequence_len: 4096
pad_to_sequence_len: true
lora_r: 64
lora_alpha: 32
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
lora_target_modules:
- gate_proj
- down_proj
- up_proj
- q_proj
- v_proj
- k_proj
- o_proj
gradient_accumulation_steps: 1
micro_batch_size: 1
num_epochs: 1
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 2e-5
train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
tf32: false
gradient_checkpointing: true
early_stopping_patience:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
evals_per_epoch: 1
eval_table_size:
eval_table_max_new_tokens: 128
warmup_steps: 10
save_steps: 500
save_total_limit: 3
debug:
deepspeed: deepspeed_configs/zero2.json
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
save_safetensors: true
2)merge: python3 -m axolotl.cli.merge_lora dpo.yml --lora_model_dir="/data1/ljf2/data/dpo_out/mixtral-dpo-0207-lora" --output_dir=/data1/ljf2/data
Config yaml
No response
Possible solution
No response
Which Operating Systems are you using?
- [X] Linux
- [ ] macOS
- [ ] Windows
Python Version
3.10
axolotl branch-commit
main
Acknowledgements
- [X] My issue title is concise, descriptive, and in title casing.
- [X] I have searched the existing issues to make sure this bug has not been reported yet.
- [X] I am using the latest version of axolotl.
- [X] I have provided enough information for the maintainers to reproduce and diagnose the issue.
Could it be a quantized model? Looking at the config file, there is a quantization_config
section.
"quantization_config": {
"_load_in_4bit": true,
"_load_in_8bit": false,
"bnb_4bit_compute_dtype": "bfloat16",
"bnb_4bit_quant_type": "nf4",
"bnb_4bit_use_double_quant": true,
"llm_int8_enable_fp32_cpu_offload": false,
"llm_int8_has_fp16_weight": false,
"llm_int8_skip_modules": null,
"llm_int8_threshold": 6.0,
"load_in_4bit": true,
"load_in_8bit": false,
"quant_method": "bitsandbytes"
},
My trick was using save_steps
and manually merging the last checkpoint to the base.
And OP, please update the title of this issue since the current one is unclear.