reinforcement-learning
reinforcement-learning copied to clipboard
fix the probabilities for each action bug
the probabilities for the best action is 1 - epsilon so the sum of the probabilities for the rest of actions is epsilon and the number of them should be nA-1