deep-rl-tensorflow
deep-rl-tensorflow copied to clipboard
Setting's of the Corridor game
Could you please tell me how did you set the reward at each state? It seems that all F states will receive an reward thus an agent might just keep staying on F states till episode ends and it will automatically receive max reward. I cannot reproduce the result of the dueling network's corridor game. Could you please give me any hints?