etienne87
etienne87
Another confusion I have about this, (because little experience with TF). It seems we need 2 graphs : one for prediction (taking a dynamic_rnn), and one (maybe taking a static...
@mbz ok! What I mean : In classic A3C, it seems we can just backprop at the end of an episode (T_MAX), by just re-using the already computed predictions. On...
@ricky1203 : could you perhaps provide an example/ link in context?
@Golly not so much to be honest. Also I think I first need to test idea referred in #16; Otherwise LSTM version will need re-computation of _TMAX_ steps before each...
Coming back to this problem with a slightly more understanding on with variable length rnn : I think the easiest way to code the LSTM version is to keep track...
Anyway, there is a first implementation that works fine if you don't have too much underachieved experiences (of length < Config.TIME_MAX) [here](https://github.com/etienne87/GA3C/blob/lstm/ga3c/NetworkVP.py) I "solved" the issue by padding sequences in...
Hum, Actually there was still an error in my code, I forgot to mask the loss for padding inputs! I propose a first fix [here](https://github.com/etienne87/GA3C/blob/lstm_cartpole/ga3c/NetworkVP.py#L152) Apparently this now works better...
I did a quick trial in one of my [branches](https://github.com/etienne87/GA3C/blob/lstm/ga3c/NetworkVP_torch.py) . Actually, TF is almost twice as fast, because the naive way I did the vectorized loss is probably involving...
interesting @ppwwyyxx ! My naive implementation gives something like this :  I am not sure if the problem is in the batching, rather than the explicit calls &...
I don't understand how the batch can be large and t-max small? You need to accumulate frames during t-max steps before doing a backprop right?