Human-in-the-loop-Deep-Reinforcement-Learning icon indicating copy to clipboard operation
Human-in-the-loop-Deep-Reinforcement-Learning copied to clipboard

a bug about critic update

Open EvergrowHook opened this issue 1 year ago • 1 comments

Hello,

I find your work is really helpful and I really appreciate it, however I found a bug at critic update stage which affect the final performance.

It is in TD3HUG.py at L80-L81

noise1 = (torch.randn_like(ba) * self.policy_noise).clamp(0, 1)
a_ = (self.actor_target(bs_).detach() + noise1).clamp(0, 1)

I think the first parameter in clamp should be the lower limit rather than 0, and I think noise1 should use the NOISE_CLIP hyperparameter.

I change these two lines into

noise1 = (torch.randn_like(ba) * self.policy_noise).clamp(-self.noise_clip, self.noise_clip) # self.noise_clip refer to the NOISE_CLIP hyperparameter
a_ = (self.actor_target(bs_).detach() + noise1).clamp(-1, 1)

There might be also bugs about clamp elsewhere, but I didn't check.

Would be very appreciated if you look into this.

EvergrowHook avatar Apr 04 '23 08:04 EvergrowHook