updated the step function inside MDPEnv class

Open vaishn99 opened this issue 2 years ago • 0 comments

Found a bug and solved it.

I have noticed a bug in the library.

Modification:

Change is made for the step function, which is inside MDPEnv

explanation:

choosing "next_state and reward" is not synchronized, each of them is independently sampled previously, but they should be synchronized as per the definition.

Jan 23 '23 09:01 vaishn99