Andi Mullenix comments

Results 8 comments of


                                            Andi Mullenix

DQN solution results peak at ~35 reward

Thanks for the responses. So I manually performed some environment steps and displayed their preprocessed forms. An example is below. It looks OK to me. The paper also mentions taking...

DQN solution results peak at ~35 reward

Nice catch. That does seem like it would limit the network's expressiveness. Unless you already have something running, I'll go ahead and fire up an instance and give it a...

DQN solution results peak at ~35 reward

Unfortunately, it seems to have plateaued again. Currently at episode 3866. Here's the code I used: ``` fc1 = tf.contrib.layers.fully_connected(flattened, 512, activation_fn=tf.nn.relu, biases_initializer=tf.constant_initializer(0.0)) self.predictions = tf.contrib.layers.fully_connected(fc1, len(VALID_ACTIONS), biases_initializer=tf.constant_initializer(0.0)) ```

DQN solution results peak at ~35 reward

I'm having a hard time wrapping my head around the impact of ending an episode at the loss of a life considering that training uses random samples of replay memory...

DQN solution results peak at ~35 reward

On another note, I manually performed a few steps (fire, left, left) while printing out the processed state stack one depth (time step) at a time, and it checks out....

DQN solution results peak at ~35 reward

Ahhh, yeah that makes a lot of sense. Thanks. I wonder then about the relative impact of doing a full episode restart vs. just reinitializing the state stack and anything...

DQN solution results peak at ~35 reward

Awesome! Yes, I will give this a go for the Q-Learning in just a bit here.

DQN solution results peak at ~35 reward

Not sure if either of you have encountered this yet, but when a gym monitor is active, attempting to reset the environment when the episode is technically still running results...