reinforcement-learning
reinforcement-learning copied to clipboard
Wrong max of next state action?
In QAgent train(), there is
self.Q[s,a] = self.Q[s,a] + self.lr * (r + self.gamma*np.max(self.Q[s_next,a]) - self.Q[s,a])
but should be imho
self.Q[s,a] = self.Q[s,a] + self.lr * (r + self.gamma*np.max(self.Q[s_next,:]) - self.Q[s,a])