Marco Pleines
Marco Pleines
It is considered being a bad transition, because the new state is the initial one from a completely new episode, correct?
Did you figure out an answer @miguelsuau ? Bump @ikostrikov I started a [discussion](https://discuss.pytorch.org/t/check-flow-of-gradients-concerning-hidden-states-in-a-recurrent-policy/84008) in the PyTorch forums and used Pytorchviz to visualize the backpropagation graph of this and my...
I printed the shape of the inputs to the GRU layer and observed that the sequence lengths vary (probably depending on the episode length). So the max length of the...
In the case of running: ``` python main.py --env-name "PongNoFrameskip-v4" --algo ppo --use-gae --lr 2.5e-4 --clip-param 0.1 --value-loss-coef 0.5 --num-processes 8 --num-steps 512 --num-mini-batch 4 --log-interval 1 --use-linear-lr-decay --entropy-coef 0.01...
I pretty much abandoned this repository to work on my own implementation with more comments and documentation. Still WIP. [https://github.com/MarcoMeter/neroRL/tree/update/sequence_buffer_masked_loss](https://github.com/MarcoMeter/neroRL/tree/update/sequence_buffer_masked_loss) [recurrent policy doc](https://github.com/MarcoMeter/neroRL/blob/update/sequence_buffer_masked_loss/docs/recurrent_policy.md)
@binaryoung Thanks for sharing your findings! A couple of weeks ago I published a baseline/reference implementation that does truncated bptt. https://github.com/MarcoMeter/recurrent-ppo-truncated-bptt
Hi @WilliamYue37 I was not able to reproduce your issue. I'm wondering whether your model file is corrupted. Try to download it again using this link [mortar_mayhem_grid.nn](https://drive.google.com/file/d/1_XdvqNm69ZjsGOXCM9cJ1ZLGa246ausb/view?usp=drive_link) Does this happen...
Thanks for the insights. I deleted and readded the pretrained models. Hope this helps.
I downloaded the zip of this repo and was able to reproduce this issue. Not sure yet how to fix this soon. Lets keep this open so that others notice...
 The grey curve is the most problematic one. The IQM already shows strong volatility, while the stratified bootstrapped confidence interval is very narrow. Utilizing less data or further lowering...