Mava
Mava copied to clipboard
Feature: Observations stacking
What?
- Add an environment wrapper that stacks new observation to past observations for training.
Why?
- This is one of the suggestion in Yu et al (2021) for Atari environments. Remembering past observations can help the agent choose a better policy.
How?
- Added a new wrapper class that uses a deque list to store past observations.
- During reset, we just repeat the same observation for the required number of frames.
- During a step, we delete the oldest obs (add the top of the queue) and add the most recent one to the end of the queue.
Extra
- Closes #788
Codecov Report
Merging #793 (6f92bc7) into develop (999d31e) will not change coverage. The diff coverage is
n/a
.
@@ Coverage Diff @@
## develop #793 +/- ##
========================================
Coverage 93.12% 93.12%
========================================
Files 167 167
Lines 9253 9253
========================================
Hits 8617 8617
Misses 636 636
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more