Youtube-Code-Repository
Youtube-Code-Repository copied to clipboard
noise size equal to number of actions
https://github.com/philtabor/Youtube-Code-Repository/blob/733e4526f9920e5b710e29077fb85a457eec1ea9/ReinforcementLearning/PolicyGradient/TD3/td3_torch.py#L163
Instead of a scalar noise, should be a vector of number of actions size mu_prime = mu + T.tensor(np.random.normal(scale=self.noise,size=(self.n_actions,)),
We're allowed to add a scalar quantity to a vector. Is there a reason why each component of the mu tensor should have a different random number added to it?
True, but by making mu tensor perturb by different random numbers, you can increase the exploration (as opposed to using the same random number)