DRL-code-pytorch icon indicating copy to clipboard operation
DRL-code-pytorch copied to clipboard

TD3中learn()函数部分的参数冻结问题

Open Cassini-Titan opened this issue 1 year ago • 0 comments

大佬你好,我想请教一下,TD3中的延迟策略更新部分,涉及到参数冻结,我感觉把这两个冻结和解冻操作去除好像也不会影响代码,因为中间没有涉及到对Critic网络的更新操作。 代码:

# Trick 3:delayed policy updates 延迟策略更新
        if self.actor_pointer % self.policy_freq == 0:
            # Freeze critic networks so you don't waste computational effort 
            
#冻结部分*********************************************************************************
            for params in self.critic.parameters():
                params.requires_grad = False #删除冻结和解冻部分似乎没有影响?
 #*********************************************************************************************

            # Compute actor loss
            actor_loss = -self.critic.Q1(batch_s, self.actor(batch_s)).mean()  # Only use Q1
            # Optimize the actor
            self.actor_optimizer.zero_grad()
            actor_loss.backward()
            self.actor_optimizer.step()

            # Unfreeze critic networks
#解冻部分*********************************************************************************
            for params in self.critic.parameters():
                params.requires_grad = True
#*********************************************************************************************

            # Softly update the target networks 软更新目标网络
            for param, target_param in zip(self.critic.parameters(), self.critic_target.parameters()):
                target_param.data.copy_(self.TAU * param.data + (1 - self.TAU) * target_param.data)

            for param, target_param in zip(self.actor.parameters(), self.actor_target.parameters()):
                target_param.data.copy_(self.TAU * param.data + (1 - self.TAU) * target_param.data)

Cassini-Titan avatar Dec 11 '23 15:12 Cassini-Titan