Mava
Mava copied to clipboard
[FEATURE] Observation concatenation.
Please describe the purpose of the feature. Is it related to a problem?
When using feedforward policies, partial observability can be handled by using multiple observations instead of only the current one. This is achieved by concatenating the previous n-1
observations to observations to the current observation.
Describe the solution you'd like
An additional environment wrapper that keeps track if the previous n-1
environment observations and is able to concatenate then to be used as the input for a feedforward policy. For example, if n=1
, then only the current observation is used as input, and if n=2
, then the current and previous observations are concatenated to form the input.
How do we know when implementation of this feature is complete?
Checklist:
- Wrapper is implemented and working
- Tests are written for the wrapper
- The effect on system performance is benchmarked and reported.