mdp
mdp copied to clipboard
updated the step function inside MDPEnv class
Found a bug and solved it.
I have noticed a bug in the library.
Modification:
Change is made for the step function, which is inside MDPEnv
explanation:
choosing "next_state and reward" is not synchronized, each of them is independently sampled previously, but they should be synchronized as per the definition.