DeepSpeedExamples icon indicating copy to clipboard operation
DeepSpeedExamples copied to clipboard

if ref_model is a copy of act_model at begining in stage3 , does it mean the kl_divergence is 0?

Open janelu9 opened this issue 2 years ago • 1 comments

janelu9 avatar Apr 24 '23 09:04 janelu9

kl_divergence_estimate = -self.kl_ctl * (log_probs - ref_log_probs)

janelu9 avatar Apr 24 '23 09:04 janelu9

At beginning in stage3, kl_divergence_estimate should be zero. But, after several steps, the generation of actor model might be different from reference model. Please correct me if I make any mistake.

liwyNo avatar Apr 27 '23 07:04 liwyNo

Thank you @liwyNo. Yes, that's just for one step

yaozhewei avatar May 01 '23 18:05 yaozhewei

At beginning in stage3, kl_divergence_estimate should be zero. But, after several steps, the generation of actor model might be different from reference model. Please correct me if I make any mistake.

I found that actor model has lora parameters but the reference model doesn't , so they may be different although at the first step.

janelu9 avatar May 10 '23 05:05 janelu9

At beginning in stage3, kl_divergence_estimate should be zero. But, after several steps, the generation of actor model might be different from reference model. Please correct me if I make any mistake.

I found that actor model has lora parameters but the reference model doesn't , so they may be different although at the first step.

For lora, the lora_left_weight (https://github.com/microsoft/DeepSpeedExamples/blob/master/applications/DeepSpeed-Chat/training/utils/module/lora.py#LL39C14-L39C30) will be init with zero matrix which means the lora weight won't affect actor model at the first step.

liwyNo avatar May 10 '23 06:05 liwyNo

At beginning in stage3, kl_divergence_estimate should be zero. But, after several steps, the generation of actor model might be different from reference model. Please correct me if I make any mistake.

I found that actor model has lora parameters but the reference model doesn't , so they may be different although at the first step.

For lora, the lora_left_weight (https://github.com/microsoft/DeepSpeedExamples/blob/master/applications/DeepSpeed-Chat/training/utils/module/lora.py#LL39C14-L39C30) will be init with zero matrix which means the lora weight won't affect actor model at the first step.

well, it is

janelu9 avatar May 10 '23 08:05 janelu9