mlsh icon indicating copy to clipboard operation
mlsh copied to clipboard

Wrong environment reset?

Open TheCrazyT opened this issue 5 years ago • 1 comments

See: https://github.com/openai/mlsh/blob/master/mlsh_code/rollouts.py#L83

Are you shure that "and" condition there is correct? It would mean that the environment gets not reset although you already reached the goal. (It will do another x steps till macrolen gets reached although its clear it won't get any reward) And also means that the environment will never reset at all if you are at a stage where the goal cannot be reached anymore. An "or" would make more sense in my opinion.

TheCrazyT avatar Apr 24 '19 15:04 TheCrazyT

Alright, guess i misunderstood macro_duration since it seems to define the tick where a new substrategy is choosen. (guess i mixed it up with "max_episode_steps") Using an "or" would be wrong,too. First i thought it was an exit timer, so that the current iteration stops if it does not reach the goal in time.

But the point about "It would mean that the environment gets not reset although you already reached the goal." is still valid, although it should not matter if macro_duration is not that big.

Edit: Another problem is that some other environments do return True for the done/new flag just one time. (in that case the current code just ignores it) But I'm not shure if thats wrong behaviour of mlsh or the environment.

TheCrazyT avatar May 10 '19 16:05 TheCrazyT