Potential memory leak from EnvPlayer
There seem to be several memory leaks as matches pile up. But one area that is causing some of it appears to be _reward_buffer in EnvPlayer. It holds references to AbstractBattles and doesn't ever release them. An easy solution is to swap its initialization to weakref.WeakKeyDictionary. That will allow the keys/Battles to not have their garbage collection be blocked by the _reward_buffer. If the battles are deleted elsewhere, they'll be dropped from the dict automatically. I'm still testing and looking for other areas that could be causing the memory leaks.
Implementing that change and noticing opponents need their battles reset too seems to have fixed the Battle memory leak. This may also clear up the degrading performance over time others have noticed.
Great work! Out of curiosity, which tool did you use to monitor garbage collection and detect this?
I used Pympler in a training callback that monitored changes in memory through the number of existing objects. I noticed the number of battles kept increasing each iteration even though I modified the code to delete them and manually called the garbage collector. So I knew a reference to them had to be hanging around somewhere. After looking through the code, I found that _reward_buffer and guessed it might be what was hanging on to the memory. After clearing it, the memory leak went away.
However, the performance degradation over time others have noticed still exists. I'm still debugging that one. I'm not sure if it's a poke-env, showdown, or stablebaselines3 issue. I'll be sure to put up an issue if I narrow it down.