Human-in-the-loop-Deep-Reinforcement-Learning
Human-in-the-loop-Deep-Reinforcement-Learning copied to clipboard
a bug about critic update
Hello,
I find your work is really helpful and I really appreciate it, however I found a bug at critic update stage which affect the final performance.
It is in TD3HUG.py at L80-L81
noise1 = (torch.randn_like(ba) * self.policy_noise).clamp(0, 1)
a_ = (self.actor_target(bs_).detach() + noise1).clamp(0, 1)
I think the first parameter in clamp should be the lower limit rather than 0, and I think noise1 should use the NOISE_CLIP hyperparameter.
I change these two lines into
noise1 = (torch.randn_like(ba) * self.policy_noise).clamp(-self.noise_clip, self.noise_clip) # self.noise_clip refer to the NOISE_CLIP hyperparameter
a_ = (self.actor_target(bs_).detach() + noise1).clamp(-1, 1)
There might be also bugs about clamp elsewhere, but I didn't check.
Would be very appreciated if you look into this.