SIMPLE
SIMPLE copied to clipboard
issue when legal actions mask is dependant on current player
I have a custom environment where the legal actions depend on the state of the board and the current player , and when I try to train my first agent the legal_actions
mask isn't computed correctly for the agent, but it is for the opponent. Im guessing the issue comes from the code below (found in SelfPlayWrapper). Since the legal_actions
depend on current_player_num
and agent_player_num != current_player_num
it can not calculate the correct mask for the agent. Please let me know if you have any ideas on how to fix this
def continue_game(self):
observation = None
reward = None
done = None
while self.current_player_num != self.agent_player_num:
action = self.current_agent.choose_action(self, choose_best_action = False, mask_invalid_actions = True)
observation, reward, done, _ = super(SelfPlayEnv, self).step(action)
logger.debug(f'Rewards: {reward}')
logger.debug(f'Done: {done}')
if done:
break
return observation, reward, done, None
Did you found a solution to this? I have the same problem when running Test. On the other hand while running Train, my agent does not care about the legal_actions what so ever... it doesnt call it at all and just chooses a random action num
Did you found a solution to this? I have the same problem when running Test. On the other hand while running Train, my agent does not care about the legal_actions what so ever... it doesnt call it at all and just chooses a random action num
Yeah this is my exact issue. Haven't found a solution yet