fferreres comments

Results 14 comments of


                                            fferreres

DQN solution results peak at ~35 reward

I saw the same on a3c after 1.9 million steps: ![A3C on Breakout](https://aifromscratch.files.wordpress.com/2016/11/picture1.png?quality=80&strip=info&w=720) And attributed it to the fact that I only used 4 learners (2 are hyper threads, only...

DQN solution results peak at ~35 reward

1) I don't know how to add the graph the right way to review with TensorBoard, since at `tf.train.SummaryWriter(..., sess.graph)`time the session isn't created yet (but much later). If I...

DQN solution results peak at ~35 reward

> Depending on how conv2d combines channel filters, maybe history is getting reduced to one state in the first convolutional layer. Maybe depthwise_conv2d is relevant or, alternatively, flattening the stack...

DQN solution results peak at ~35 reward

Is something like this possible? If the action is not taken, set the target value be equal to the current Q prediction for that action. For the (one) Action taken,...

DQN solution results peak at ~35 reward

Denny, > Really good point and may be the key here. This could make a huge difference. I looked at the paper again and I actually can't find details on...

DQN solution results peak at ~35 reward

@dennybritz, yeah, I have things mixed up. Too many things to try to understand too quickly (new to Python, Numpy, TensorFlow, vim, Machine Learning, RL, refresh statistics/prob 101 :-)

DQN solution results peak at ~35 reward

Here are a few threads discussing low scores vs paper. It can be a source of troubleshooting ideas: https://github.com/tambetm/simple_dqn/issues/32 https://groups.google.com/forum/m/#!topic/deep-q-learning/JV384mQcylo https://docs.google.com/spreadsheets/d/1nJIvkLOFk_zEfejS-Z-pHn5OGg-wDYJ8GI5Xn2wsKIk/htmlview I re-read all the DQN.py (not the notebook) and...

fferreres

DQN solution results peak at ~35 reward

DQN solution results peak at ~35 reward

DQN solution results peak at ~35 reward

DQN solution results peak at ~35 reward

DQN solution results peak at ~35 reward

DQN solution results peak at ~35 reward

DQN solution results peak at ~35 reward

DQN solution results peak at ~35 reward

Changable base prompt

--extra-index-url