How Llam2 & BGE is finetuned in LM-Cocktail

Open cahuja1992 opened this issue 2 years ago • 1 comments

There is a table in the paper mentioning the data splits (train-test) of every domain. But the finetuning-related details are not there, can you please provide the finetuning details as well, like follows:

What are the hyperparameters used for finetuning?
If we need to reproduce the finetuned model, how that can be done?
Is any hyperparameter tuning done for finetuning the base model on the target domain?

Jan 04 '24 08:01 cahuja1992

Thanks for your interest in our work!

We show the important hyperparameters in section 3: Experimental setup. Here is the detailed command we used to finetune llama with Fastchat tool: --num_train_epochs 3
--per_device_train_batch_size 2
--per_device_eval_batch_size 2
--gradient_accumulation_steps 8
--evaluation_strategy "no"
--save_strategy "steps"
--save_steps 1200
--save_total_limit 10
--learning_rate 2e-5
--weight_decay 0.
--warmup_ratio 0.03
--lr_scheduler_type 'cosine'
--logging_steps 10
--deepspeed ./ds_config.json
--tf32 True
--model_max_length 1024
--gradient_checkpointing True

For BGE, the command used in FlagEmbedding is --normlized True
--temperature 0.02
--do_train
--train_data $DATA_PATH
--query_max_len 48
--passage_max_len 200
--fp16
--per_device_train_batch_size 32
--sentence_pooling_method cls
--save_steps 2000
--train_group_size 8
--learning_rate 2e-5
--num_train_epochs ${EPOCH[i]}
--negatives_cross_device
--dataloader_num_workers 8
--logging_steps 20
--warmup_ratio 0.1
--weight_decay 0.01
--overwrite_output_dir True These experiments are conducted on 8*A100(40G) GPUs.

We provide the fine-tuned models in huggingface , you can reproduce the experimental results with them. If you want to reproduce the finetuned model, you can download the data from intfloat/llm-retriever-tasks. We also can share the preprocessed training data, and we will tell you when the data is ready.
We merely selected a suitable set of parameters without hyperparameter tuning.

Jan 04 '24 09:01 staoxiao