smac
smac copied to clipboard
Environment inconsistency bug when reset() is called twice at the end of an episode.
We discover that when smac is reset twice at the end, the environment has unknown problems causing strange results, e.g. a model supposed to hit 95%+ win rate to reduce to 50%- win rate.
Method to reproduce:
- find a trained model on map MMM2, freeze it for evaluation
- change
res = self._env.reset()toself._env.reset(); res=self._env.reset()(reset twice) - observe significant win rate decline
Although we can easily avoid reset() twice by adding some if-else, but this is obviously a bug that can cause protential troubles.
may be add some notice to warn others not to mess with reset function