Stoix icon indicating copy to clipboard operation
Stoix copied to clipboard

[FEATURE] Add support for efficient recurrent models

Open smorad opened this issue 4 months ago • 1 comments

Feature

Revisiting Recurrent Reinforcement Learning with Memory Monoids provides a method to combine recurrent models with standard, nonrecurrent RL losses. This should provide support for S5, LRU, FFM, Linear Transformer, etc recurrent models in stoix. Note that models like LSTM or GRU would be intractable under this paradigm, and would require significantly more effort to integrate into stoix.

Proposal

  • Implement an abstract base class for recurrent models, with map_to_recurrent_state, map_from_recurrent_state, parallel_recurrent_update, initial_recurrent_state, and identity_element methods.
  • Implement a general-purpose episodic reset operator using initial_recurrent_state and identity_element methods
  • Implement one or more of S5, LRU, FFM based on the base class
  • Create a make_recurrent_loss function that wraps a non-recurrent loss function
    • This method will scan over one long contiguous series of observations from the replay buffer, produce Markov states, and feed Markov states into the wrapped non-recurrent loss function
  • Demonstrate sequence model + DQN on one or two POPGym tasks

Testing

Show that it can solve a few POPGym tasks

Benchmarking (Optional)

Definition of done

We have an example that can solve a few POPGym tasks

Mandatory checklist before making a PR

  • [ ] The success criteria laid down in “Definition of done” are met.
  • [ ] Code is documented - docstrings for methods and classes, static types for arguments.
  • [ ] Code is tested - unit, integration and/or functional tests are added.
  • [ ] Documentation is updated - README, CONTRIBUTING, or other documentation.
  • [ ] All functional tests are green.
  • [ ] Link experiment/benchmarking after implementation (optional).

Links / references / screenshots

smorad avatar Apr 04 '24 10:04 smorad