DeepRL-Agents Are you sure Deep recurrent notebook is correct?

Are you sure Deep recurrent notebook is correct?

Open Joshuaalbert opened this issue 7 years ago • 2 comments

In the notebook I don't see where your recurrent Q value model gets its trace dimension. You're just reshaping the output of a convnet and feeding this directly into an LSTM. Furthermore, should you not also provide the non-zero initial state determined at play time? I.e. the internal state should be stored in the experience buffer and used during training. Corrent me if I'm wrong please.

Jan 13 '18 12:01 Joshuaalbert

Yap i thought in similar way but turned that the code seems work correctly.

Reshaping issue Here batch_size, trace_length are set to 4,8. Each Qnetwork object(main, target) receives batchtrace=32 frames. After conv4, dimension are turned into (32, 1, 1, 512) = (batchtrace, w, h, hidden units).
Non-zero H0 is iteratively updated and given to feed_dict[network.state]. This state is 'last hidden state' returned by each LSTM forward passing.

Dec 30 '18 12:12 geonyeong-park

I had another thought. Isn't it unnecessary to have a target network for this notebook in the first place? Since you are setting the target network to be equal to the mainDQN right before training?

Jul 02 '19 20:07 Michaeljurado42

DeepRL-Agents DeepRL-Agents copied to clipboard

Are you sure Deep recurrent notebook is correct?

DeepRL-Agents
DeepRL-Agents copied to clipboard