if ref_model is a copy of act_model at begining in stage3 , does it mean the kl_divergence is 0?
kl_divergence_estimate = -self.kl_ctl * (log_probs - ref_log_probs)
At beginning in stage3, kl_divergence_estimate should be zero. But, after several steps, the generation of actor model might be different from reference model. Please correct me if I make any mistake.
Thank you @liwyNo. Yes, that's just for one step
At beginning in stage3, kl_divergence_estimate should be zero. But, after several steps, the generation of actor model might be different from reference model. Please correct me if I make any mistake.
I found that actor model has lora parameters but the reference model doesn't , so they may be different although at the first step.
At beginning in stage3, kl_divergence_estimate should be zero. But, after several steps, the generation of actor model might be different from reference model. Please correct me if I make any mistake.
I found that actor model has lora parameters but the reference model doesn't , so they may be different although at the first step.
For lora, the lora_left_weight (https://github.com/microsoft/DeepSpeedExamples/blob/master/applications/DeepSpeed-Chat/training/utils/module/lora.py#LL39C14-L39C30) will be init with zero matrix which means the lora weight won't affect actor model at the first step.
At beginning in stage3, kl_divergence_estimate should be zero. But, after several steps, the generation of actor model might be different from reference model. Please correct me if I make any mistake.
I found that actor model has lora parameters but the reference model doesn't , so they may be different although at the first step.
For lora, the lora_left_weight (https://github.com/microsoft/DeepSpeedExamples/blob/master/applications/DeepSpeed-Chat/training/utils/module/lora.py#LL39C14-L39C30) will be init with zero matrix which means the lora weight won't affect actor model at the first step.
well, it is