poke-env icon indicating copy to clipboard operation
poke-env copied to clipboard

Question about battle.available_switches

Open akashsara opened this issue 2 years ago • 6 comments

Hi @hsahovic, I've been working on a reinforcement learning agent and had a question about the battle.available_switches.

From poke_env/environment/battle.py I can see that battle.available_switches is based off this code snippet:

        if not self.trapped:
            for pokemon in side["pokemon"]:
                if pokemon:
                    pokemon = self._team[pokemon["ident"]]
                    if not pokemon.active and not pokemon.fainted:
                        self._available_switches.append(pokemon)

Now if I'm reading this right, available switches only gives us a list of pokemon that are available to switch to.

However, in player/env_player.py, we have:

Gen8EnvSinglePlayer:

elif 0 <= action - 16 < len(battle.available_switches):
            return self.agent.create_order(battle.available_switches[action - 16])

I feel like this is a little misleading here. Consider this scenario and assume action - 16 = [0,1,2,3,4,5]: Now if we have 3 Pokemon left, say the active Pokemon and the last two, then our actual available Pokemon might be [0, 4, 5] However, battle.available_switches will only have length 3. So if we were to choose action 4 or 5, it would be ignored here.

Am I misinterpreting this or is there something wrong here? Or alternatively, is there a better way to specify which Pokemon we want to switch to?

Edit: This is also a little problematic when we have all 6 Pokemon. Since the active Pokemon will always be an invalid switch option, the length of battle.available_switches will at best be 5. So it won't be possible to switch to the 6th Pokemon at all in that scenario, correct?

Edit 2: So my issue mainly seems to be that I was operating under the assumption that the list of possible actions was fixed [I.E. 22 actions for Gen8 and each integer corresponds to one action regardless of whether that action is legal]. But from what I understand now, this is not the case. Please do correct me if I'm wrong on that though.

I'm planning on setting up my code so that instead of returning an indice it returns the action itself (the particular pokemon for a switch or the specific move otherwise)

akashsara avatar Jun 11 '22 21:06 akashsara

Okay so apologies if the previous message was a little rambly, I was half asleep haha. I dug a little deeper and got an example for you to make it a bit clearer.

Start of the battle:

> battle.available_switches
[sableye (pokemon object) [Active: False, Status: None], starmie (pokemon object) [Active: False, Status: None], 
sceptile (pokemon object) [Active: False, Status: None], aggron (pokemon object) [Active: False, Status: None], 
clefable (pokemon object) [Active: False, Status: None]]
> battle.team
{'p1: Aggron': aggron (pokemon object) [Active: False, Status: None], 
'p1: Sableye': sableye (pokemon object) [Active: False, Status: None], 
'p1: Starmie': starmie (pokemon object) [Active: False, Status: None], 
'p1: Sceptile': sceptile (pokemon object) [Active: False, Status: None], 
'p1: Victini': victini (pokemon object) [Active: True, Status: None], 
'p1: Clefable': clefable (pokemon object) [Active: False, Status: None]}

After 1 Pokemon Faints:

> battle.available_switches
[sableye (pokemon object) [Active: False, Status: None], victini (pokemon object) [Active: False, Status: None], 
clefable (pokemon object) [Active: False, Status: None], aggron (pokemon object) [Active: False, Status: None]]
> battle.team
{'p1: Aggron': aggron (pokemon object) [Active: False, Status: None], 
'p1: Sableye': sableye (pokemon object) [Active: False, Status: None], 
'p1: Starmie': starmie (pokemon object) [Active: False, Status: FNT], 
'p1: Sceptile': sceptile (pokemon object) [Active: True, Status: None], 
'p1: Victini': victini (pokemon object) [Active: False, Status: None], 
'p1: Clefable': clefable (pokemon object) [Active: False, Status: None]}

So now the problem here is if I wanted to switch to, say Aggron, I would use an action of 16 since Aggron is the first Pokemon in battle.team. However, when I pass 16 to action_to_move(), we go into this block:

elif 0 <= action - 16 < len(battle.available_switches):
            return self.agent.create_order(battle.available_switches[action - 16])

Which would return battle.available_switches[0], which would actually be Sableye, not Aggron. And this change in order makes it difficult for the model to take actions properly (since we would be getting state information from battle.team). Now in my code, accounting for this won't be too difficult, I'd just need to define a custom action_to_move() function. But it would be nice if there was an easier way to do this.

tl;dr: battle.available_switches uses a different order than battle.team. This causes problems in the action_to_move() function which uses battle.available_switches to choose a Pokemon.

akashsara avatar Jun 12 '22 04:06 akashsara

Hey,

if you solved this could you please share it so I can fix it as well?

mancho2000 avatar Jun 12 '22 15:06 mancho2000

Hey @akashsara,

Thanks for opening this issue and taking the time to look into the detailed implementation. You are right in your conclusion: the gym API is not expecting actions based on the order of mons in battle.team, but on the order of mons in battle.available_switches - this is also true for battle.available_moves. This is not a bug, but a feature: available_switches is meant to be used as the source of truth for possible switch actions, and it makes sense for its ordering to be used in this context.

If you want to use an ordering based on battle.team, implementing a custom version of action_to_move is the way to go. Alternatively, you could instead add a mapping mechanism to your model.

Let me know what you think :)

hsahovic avatar Jun 12 '22 16:06 hsahovic

Hi @hsahovic, Thanks for the information. I agree that having available_switches as a source of truth makes sense, but I don't know if it makes sense to use that ordering in action_to_move(). From my perspective at least, it forces a user to define a custom action_to_move/use a mapping if they want to be able to switch to a particular Pokemon. At the same time I can see why we need available_switches since not using it might lead to invalid switches. But in either case, it might help if this was mentioned somewhere in the docs.

@mancho2000, Sure. I haven't created yet but I will post the modified code here when I'm done.

akashsara avatar Jun 12 '22 21:06 akashsara

I also found a potential issue while defining my custom action_to_move. The currently implemented version does the following:

        ...
        elif (
            action < 4
            and action < len(battle.available_moves)
            and not battle.force_switch
        ):
            return self.agent.create_order(battle.available_moves[action])
       ...
        else:
            return self.agent.choose_random_move(battle)

Specifically, it lets you pick a move only if battle.force_switch is False. However, if you have no Pokemon on the bench, you won't be able to switch Pokemon at all. So despite performing a legal action, the code automatically returns a random move.

From my test these were the values at one particular point in a battle:

battle.force_switch = True
battle.available_moves = []
battle.available_switches = []

Edit: I've reworked this by doing force_switch = (len(battle.available_switches) > 0) and battle.force_switch and using the same not force_switch condition in each of the if statements. However since available_moves is still blank, it's problematic.

akashsara avatar Jun 13 '22 04:06 akashsara

@mancho2000 here's the custom _action_to_move function I'm using.

# -*- coding: utf-8 -*-
from poke_env.environment.battle import Battle
from poke_env.player.battle_order import BattleOrder, ForfeitBattleOrder
from poke_env.player.env_player import Gen8EnvSinglePlayer

class Gen8EnvSinglePlayerFixed(Gen8EnvSinglePlayer):
    """
    Fixes an issue with inconsistencies in the order of items
    in battle.team vs battle.available_switches
    and battle.active_pokemon.moves vs battle.available_moves
    Ref: https://github.com/hsahovic/poke-env/issues/292
    """

    def __init__(self, *args, **kwargs):
        super(Gen8EnvSinglePlayerFixed, self).__init__(*args, **kwargs)

    def _action_to_move(self, action: int, battle: Battle) -> BattleOrder:
        """Converts actions to move orders.

        The conversion is done as follows:

        action = -1:
            The battle will be forfeited.
        0 <= action < 4:
            The actionth available move in battle.active_pokemon.moves is 
            executed.
        4 <= action < 8:
            The action - 4th available move in battle.active_pokemon.moves is 
            executed, with z-move.
        8 <= action < 12:
            The action - 8th available move in battle.active_pokemon.moves is 
            executed, with mega-evolution.
        12 <= action < 16:
            The action - 12th available move in battle.active_pokemon.moves is 
            executed, while dynamaxing.
        16 <= action < 22
            The action - 16th available switch in battle.team is executed.

        If the proposed action is illegal, a random legal move is performed.

        :param action: The action to convert.
        :type action: int
        :param battle: The battle in which to act.
        :type battle: Battle
        :return: the order to send to the server.
        :rtype: str
        """
        moves = list(battle.active_pokemon.moves.values())
        team = list(battle.team.values())
        force_switch = (len(battle.available_switches) > 0) and battle.force_switch
        # We use the ID of each move here since in some cases
        # The same moves might be in both battle.available_moves
        # and battle.active_pokemon.moves but they may not be the same object
        available_move_ids = [move.id for move in battle.available_moves]
        available_z_moves = [move.id for move in battle.active_pokemon.available_z_moves]

        if action == -1:
            return ForfeitBattleOrder()
        # Special case for Struggle since it's never a part of pokemon.moves
        elif (
            action < 4
            and action < len(moves)
            and not force_switch
            and len(available_move_ids) == 1
            and available_move_ids[0] == "struggle"
        ):
            return self.create_order(battle.available_moves[0])
        elif (
            action < 4
            and action < len(moves)
            and not force_switch
            and moves[action].id in available_move_ids
        ):
            return self.create_order(moves[action])
        elif (
            battle.can_z_move
            and battle.active_pokemon
            and 0 <= action - 4 < len(moves)
            and not force_switch
            and moves[action - 4].id in available_z_moves
        ):
            return self.create_order(moves[action - 4], z_move=True)
        elif (
            battle.can_mega_evolve
            and 0 <= action - 8 < len(moves)
            and not force_switch
            and moves[action - 8].id in available_move_ids
        ):
            return self.create_order(moves[action - 8], mega=True)
        elif (
            battle.can_dynamax
            and 0 <= action - 12 < len(moves)
            and not force_switch
            and moves[action - 12].id in available_move_ids
        ):
            return self.create_order(moves[action - 12], dynamax=True)
        elif (
            not battle.trapped
            and 0 <= action - 16 < len(team)
            and team[action - 16] in battle.available_switches
        ):
            return self.create_order(team[action - 16])
        else:
            return self.choose_random_move(battle)

akashsara avatar Jun 15 '22 07:06 akashsara