gym-mupen64plus icon indicating copy to clipboard operation
gym-mupen64plus copied to clipboard

Better long-term architecture

Open bzier opened this issue 6 years ago • 1 comments

Define better long-term architecture for specifying/configuring environment variations:

  • Character (including random)
  • Course (including random)
  • Reward system
    • Checkpoint enablement
  • Action space
    • Continuous
    • Discrete
  • etc.

Currently each variation is being created as a subclass and registered as a named gym environment. Some research needs to be done to determine the best approach moving forward as more variations are established (as well as appropriate versioning and managing backwards-compatibility).

At the moment this is being driven by MarioKart (no pun intended), but the implications affect future game implementations as well.

bzier avatar Nov 12 '17 02:11 bzier

In looking at this a little bit, it seems like the gym registration allows for passing arguments to the constructor of the environment class (see here). I had missed that detail before, but that should make it simpler to handle environment variations without needing to create entire subclasses just to alter a few things.

At the moment, I'm thinking it makes sense to approach this with an IoC / DI perspective. I'd like to create a framework for hooking into certain points in the environment (e.g. _get_reward(), _reset(), etc). Then refactor out injectable components (e.g. a reward function component). Creating a new reward function can then be done easily without affecting an existing one. A new environment variation can be registered which injects the specific new component while existing registrations are unchanged (as is required by gym conventions).

Simpler and more static variations, or things that only affect initialization, like character and course choices, can just be parameters to the constructor (no need for a whole subcomponent for that).

Since the core system is working fairly well and doesn't require a lot of changes at this point, this should help to stabilize things. Changes to the core aspects of the environment can then occur hopefully without behavior changes (non-functional improvements like performance). It may also be possible to allow unit/regression testing to prevent unintended issues when improving that core code.

Experiments should be easier to manage in isolation without worrying about breaking others. If you want to try a new rewards system, no need to change the existing one, just add a new one.

Ok, dead horse has been beaten... Just needed to get the thoughts out of my head.

bzier avatar Nov 25 '17 02:11 bzier