RL-Learner-Lucky
RL-Learner-Lucky
which paper? can you tell me
i can implenment it on dqn sucessfully,but failed with ppo
`temp_grad2=[g + 0 for g in temp_grads] ` because temp_grads are tf.VariableSynchronization.ON_READ,this operation triggers on_read event,temp2_grad should be the agrregation value of all replicas in different devies.But in fact temp2_grad...
i run the code with default parameters,so it use the notbasicset files of yours. i can also finds some cards whose effects are not implemented.