SIMPLE icon indicating copy to clipboard operation
SIMPLE copied to clipboard

issue when legal actions mask is dependant on current player

Open AdamLang96 opened this issue 1 year ago • 2 comments

I have a custom environment where the legal actions depend on the state of the board and the current player , and when I try to train my first agent the legal_actions mask isn't computed correctly for the agent, but it is for the opponent. Im guessing the issue comes from the code below (found in SelfPlayWrapper). Since the legal_actions depend on current_player_num and agent_player_num != current_player_num it can not calculate the correct mask for the agent. Please let me know if you have any ideas on how to fix this

  def continue_game(self):
            observation = None
            reward = None
            done = None
            while self.current_player_num != self.agent_player_num:
                action = self.current_agent.choose_action(self, choose_best_action = False, mask_invalid_actions = True)
                observation, reward, done, _ = super(SelfPlayEnv, self).step(action)
                logger.debug(f'Rewards: {reward}')
                logger.debug(f'Done: {done}')
                if done:
                    break

            return observation, reward, done, None

AdamLang96 avatar Sep 24 '23 04:09 AdamLang96

Did you found a solution to this? I have the same problem when running Test. On the other hand while running Train, my agent does not care about the legal_actions what so ever... it doesnt call it at all and just chooses a random action num

laymelek avatar Nov 08 '23 22:11 laymelek

Did you found a solution to this? I have the same problem when running Test. On the other hand while running Train, my agent does not care about the legal_actions what so ever... it doesnt call it at all and just chooses a random action num

Yeah this is my exact issue. Haven't found a solution yet

AdamLang96 avatar Nov 10 '23 04:11 AdamLang96