DeepSpeedExamples if ref_model is a copy of act_model at begining in stage3 , does it mean the kl

Apr 24 '23 09:04 janelu9

kl_divergence_estimate = -self.kl_ctl * (log_probs - ref_log_probs)

Apr 24 '23 09:04 janelu9

At beginning in stage3, kl_divergence_estimate should be zero. But, after several steps, the generation of actor model might be different from reference model. Please correct me if I make any mistake.

Apr 27 '23 07:04 liwyNo

Thank you @liwyNo. Yes, that's just for one step

May 01 '23 18:05 yaozhewei

At beginning in stage3, kl_divergence_estimate should be zero. But, after several steps, the generation of actor model might be different from reference model. Please correct me if I make any mistake.

I found that actor model has lora parameters but the reference model doesn't , so they may be different although at the first step.

May 10 '23 05:05 janelu9

At beginning in stage3, kl_divergence_estimate should be zero. But, after several steps, the generation of actor model might be different from reference model. Please correct me if I make any mistake.

I found that actor model has lora parameters but the reference model doesn't , so they may be different although at the first step.

For lora, the lora_left_weight (https://github.com/microsoft/DeepSpeedExamples/blob/master/applications/DeepSpeed-Chat/training/utils/module/lora.py#LL39C14-L39C30) will be init with zero matrix which means the lora weight won't affect actor model at the first step.

May 10 '23 06:05 liwyNo

At beginning in stage3, kl_divergence_estimate should be zero. But, after several steps, the generation of actor model might be different from reference model. Please correct me if I make any mistake.

I found that actor model has lora parameters but the reference model doesn't , so they may be different although at the first step.

For lora, the lora_left_weight (https://github.com/microsoft/DeepSpeedExamples/blob/master/applications/DeepSpeed-Chat/training/utils/module/lora.py#LL39C14-L39C30) will be init with zero matrix which means the lora weight won't affect actor model at the first step.

well, it is

May 10 '23 08:05 janelu9

if ref_model is a copy of act_model at begining in stage3 , does it mean the kl_divergence is 0?