gym-tictactoe
gym-tictactoe copied to clipboard
Different rewards for O and X?
trafficstars
Sorry if this is a dumb question but here you set different rewards based on whether X wins or O wins: https://github.com/haje01/gym-tictactoe/blob/master/gym_tictactoe/env.py#L8
Why is that?
If X's reward for winning is -1, wouldn't that encourage an agent that is playing for X to always loose?
Thanks.
Please consider the code below:
https://github.com/haje01/gym-tictactoe/blob/84e22fc28fe192ba0040bdd56a697f63d3d4a3d5/examples/td_agent.py#L111-L115
Ah, I see. But if I wanted to apply a ready-made algorithm, I suppose I should make the rewards positive, and just assign the "-ve" reward to the looser.