gym-tictactoe Different rewards for O and X?

Different rewards for O and X?

Open drozzy opened this issue 5 years ago • 2 comments

trafficstars

Sorry if this is a dumb question but here you set different rewards based on whether X wins or O wins: https://github.com/haje01/gym-tictactoe/blob/master/gym_tictactoe/env.py#L8

Why is that?

If X's reward for winning is -1, wouldn't that encourage an agent that is playing for X to always loose?

Thanks.

Mar 06 '20 22:03 drozzy

Please consider the code below:

https://github.com/haje01/gym-tictactoe/blob/84e22fc28fe192ba0040bdd56a697f63d3d4a3d5/examples/td_agent.py#L111-L115

Mar 07 '20 01:03 haje01

Ah, I see. But if I wanted to apply a ready-made algorithm, I suppose I should make the rewards positive, and just assign the "-ve" reward to the looser.

Mar 07 '20 02:03 drozzy

gym-tictactoe gym-tictactoe copied to clipboard

Different rewards for O and X?

gym-tictactoe
gym-tictactoe copied to clipboard