rl-control-template
                                
                                
                                
                                    rl-control-template copied to clipboard
                            
                            
                            
                        fix: reworked frame skipping and max-pooling for Atari
Pulled frame skipping out of the gymnasium environment to perform max-pooling of consecutive frames as performed in dqn zoo codebase. The agent's stream of experience should now follow the pipeline described below:
In every step of environment the Atari simulator takes 4 steps by repeating the selected action. This simplifies the RL problem and speeds up execution. If the agent loses a life or the episode terminates, the frame skipping loop ends early and the environments discount factor is set to 0. The agent receives the total reward obtained during frame skipping loop. Consecutive observations are max-pooled to handle screen flickering due to Atari2600's hardware limitations. After max-pooling the frames are resized to (84, 84) and turned grayscale. At each step, the agent receives a stack of past 4 observed (not skipped) processed frames (observation shape (84, 84, 4)). In the below diagram ~ denotes skipped frames, small letters denote max pooled frames (e.g. b = max pool(3, 4)), and capital letters denote max pooled frames after resizing and turning into grayscale (e.g. C = max pool(7, 8)).
0    | 1  2  3  4    | 5  6  7  8    | 9  10 11  12   | 13 14 15 16   | (frames)
0    | ~  ~  3  4    | ~  ~  7  8    | ~  ~  11  12   | ~  ~  15 16   | (skipping)
a    | ~  ~  ~  b    | ~  ~  ~  c    | ~  ~  ~   d    | ~  ~  ~  e    | (max-pooling)
A    | ~  ~  ~  B    | ~  ~  ~  C    | ~  ~  ~   D    | ~  ~  ~  E    | (resize and grayscale)
A000 | ~  ~  ~  AB00 | ~  ~  ~  ABC0 | ~  ~  ~   ABCD | ~  ~  ~  BCDE | (stacking)