DeepRL-Agents
DeepRL-Agents copied to clipboard
Are you sure Deep recurrent notebook is correct?
In the notebook I don't see where your recurrent Q value model gets its trace dimension. You're just reshaping the output of a convnet and feeding this directly into an LSTM. Furthermore, should you not also provide the non-zero initial state determined at play time? I.e. the internal state should be stored in the experience buffer and used during training. Corrent me if I'm wrong please.
Yap i thought in similar way but turned that the code seems work correctly.
- Reshaping issue Here batch_size, trace_length are set to 4,8. Each Qnetwork object(main, target) receives batchtrace=32 frames. After conv4, dimension are turned into (32, 1, 1, 512) = (batchtrace, w, h, hidden units).
- Non-zero H0 is iteratively updated and given to feed_dict[network.state]. This state is 'last hidden state' returned by each LSTM forward passing.
I had another thought. Isn't it unnecessary to have a target network for this notebook in the first place? Since you are setting the target network to be equal to the mainDQN right before training?