Reinforcement-learning-with-tensorflow
Reinforcement-learning-with-tensorflow copied to clipboard
A3C example fail after updating TF==1.6
Hi @MorvanZhou , the BipedalWalker A3C example fail to converge after updating the TensorFlow. It would be great if we can fix it.
https://github.com/MorvanZhou/Reinforcement-learning-with-tensorflow/blob/master/experiments/Solve_BipedalWalker/A3C.py
W_3 Ep: 7983 | ------- | Pos: 4 | RR: -16.6 | EpR: -16.4 | var: [ 4.245301 15.187426 5.2913938 13.67638 ]
W_1 Ep: 7984 | ------- | Pos: 3 | RR: -16.9 | EpR: -22.1 | var: [1.5639918 9.140908 1.399676 7.786484 ]
W_2 Ep: 7985 | ------- | Pos: 5 | RR: -16.8 | EpR: -14.6 | var: [ 4.6013346 16.964872 5.8146315 15.3746605]
W_4 Ep: 7986 | ------- | Pos: 3 | RR: -16.8 | EpR: -16.9 | var: [2.4117482 9.90723 1.954719 8.533363 ]
W_7 Ep: 7987 | ------- | Pos: 3 | RR: -16.9 | EpR: -19.8 | var: [ 2.0128653 13.1242 2.5133812 11.365165 ]
W_3 Ep: 7988 | ------- | Pos: 4 | RR: -16.8 | EpR: -14.5 | var: [ 1.7905483 10.57028 1.7253214 9.119608 ]
W_6 Ep: 7989 | ------- | Pos: 4 | RR: -16.8 | EpR: -16.8 | var: [ 4.03255 14.209857 4.585601 12.950537]
W_4 Ep: 7990 | ------- | Pos: 4 | RR: -16.7 | EpR: -14.7 | var: [ 3.7650225 14.439074 5.4745865 13.146451 ]
W_1 Ep: 7991 | ------- | Pos: 4 | RR: -16.7 | EpR: -15.6 | var: [ 4.213461 14.894608 4.1767473 13.537503 ]
W_7 Ep: 7992 | ------- | Pos: 3 | RR: -16.6 | EpR: -15.6 | var: [0.88352734 8.910546 0.9521044 7.6020703 ]
W_2 Ep: 7993 | ------- | Pos: 5 | RR: -16.4 | EpR: -13.4 | var: [ 2.968406 13.267687 3.6187844 12.018632 ]
W_3 Ep: 7994 | ------- | Pos: 3 | RR: -16.4 | EpR: -15.7 | var: [ 2.07616 11.160275 1.1988583 9.583065 ]
W_5 Ep: 7995 | ------- | Pos: 2 | RR: -17.1 | EpR: -30.1 | var: [ 0.9681119 10.007403 1.2556459 8.55513 ]
W_4 Ep: 7996 | ------- | Pos: 6 | RR: -16.8 | EpR: -11.4 | var: [ 3.3065507 13.672268 3.9533446 12.384474 ]
W_6 Ep: 7997 | ------- | Pos: 5 | RR: -16.6 | EpR: -12.3 | var: [ 3.542752 12.5449505 3.5135493 11.413937 ]
W_7 Ep: 7998 | ------- | Pos: 4 | RR: -16.4 | EpR: -12.7 | var: [ 3.7254398 14.587961 4.214101 13.035912 ]
W_2 Ep: 7999 | ------- | Pos: 4 | RR: -16.6 | EpR: -20.8 | var: [ 4.3646445 16.331139 5.4474187 14.882863 ]
W_1 Ep: 8000 | ------- | Pos: 2 | RR: -17.1 | EpR: -26.0 | var: [ 2.3089342 10.188143 2.140171 8.682452 ]
W_5 Ep: 8001 | ------- | Pos: 3 | RR: -17.1 | EpR: -18.5 | var: [1.4845458 9.2544775 1.4096836 7.9558926]
W_7 Ep: 8002 | ------- | Pos: 5 | RR: -16.9 | EpR: -12.9 | var: [ 4.1792936 12.500039 3.9695206 11.367695 ]
W_4 Ep: 8003 | ------- | Pos: 4 | RR: -16.9 | EpR: -16.5 | var: [ 2.9577925 13.383015 2.9338877 11.993065 ]
W_3 Ep: 8004 | ------- | Pos: 2 | RR: -17.3 | EpR: -25.1 | var: [1.4824905 9.347416 1.77881 7.8598304]
W_6 Ep: 8005 | ------- | Pos: 6 | RR: -17.3 | EpR: -16.6 | var: [ 4.678798 13.103448 4.7200103 11.855856 ]
fyi this is what I'm getting with tensorflow==1.5.0, 6 workers

I have updated the code for tf 1.8.0. It works fine when I test it with 8 workers.
Would you be able to post some results? rnn is still oscillating and feedforward looks like this (8 workers)

This is from a3c without rnn. The a3c with rnn may have some issues.
