rl-control-template icon indicating copy to clipboard operation
rl-control-template copied to clipboard

fix: reworked frame skipping and max-pooling for Atari

Open panahiparham opened this issue 10 months ago • 0 comments

Pulled frame skipping out of the gymnasium environment to perform max-pooling of consecutive frames as performed in dqn zoo codebase. The agent's stream of experience should now follow the pipeline described below:

In every step of environment the Atari simulator takes 4 steps by repeating the selected action. This simplifies the RL problem and speeds up execution. If the agent loses a life or the episode terminates, the frame skipping loop ends early and the environments discount factor is set to 0. The agent receives the total reward obtained during frame skipping loop. Consecutive observations are max-pooled to handle screen flickering due to Atari2600's hardware limitations. After max-pooling the frames are resized to (84, 84) and turned grayscale. At each step, the agent receives a stack of past 4 observed (not skipped) processed frames (observation shape (84, 84, 4)). In the below diagram ~ denotes skipped frames, small letters denote max pooled frames (e.g. b = max pool(3, 4)), and capital letters denote max pooled frames after resizing and turning into grayscale (e.g. C = max pool(7, 8)).

0    | 1  2  3  4    | 5  6  7  8    | 9  10 11  12   | 13 14 15 16   | (frames)
0    | ~  ~  3  4    | ~  ~  7  8    | ~  ~  11  12   | ~  ~  15 16   | (skipping)
a    | ~  ~  ~  b    | ~  ~  ~  c    | ~  ~  ~   d    | ~  ~  ~  e    | (max-pooling)
A    | ~  ~  ~  B    | ~  ~  ~  C    | ~  ~  ~   D    | ~  ~  ~  E    | (resize and grayscale)
A000 | ~  ~  ~  AB00 | ~  ~  ~  ABC0 | ~  ~  ~   ABCD | ~  ~  ~  BCDE | (stacking)

panahiparham avatar Dec 09 '24 02:12 panahiparham