stable-baselines3-contrib Custom Environment

Custom Environment

Open Zaibali9999 opened this issue 1 year ago • 6 comments

custom wrapper around the environment that tracks when invalid actions are masked and modifies the information sent to the agent accordingly

Feb 13 '23 17:02 Zaibali9999

invalid multiple actions masking on each step of an episode in PPO algorithm using stable_baselines3, in the case of 40 discrete actions and should choose next best action by implementing prediction function how i can do that

Feb 13 '23 17:02 Zaibali9999

`#added rule that only applicable mutation are allowed

    invalid_actions = self.not_applicable_mutations()
    if action_index in invalid_actions:
        i=0
        print("action changed to: " ,action_index)
        probs=self.calculate_probibity()
        for prob in probs[0]:
            if i in invalid_actions:
                probs[0][i]=0.0
            i+=1
        
        action_index = np.argmax(probs)
        print("action changed to: " ,action_index)

is this legal way to change invalid action to valid in env step function?

Feb 13 '23 17:02 Zaibali9999

Hi im working on custom env that takes some invalid action in some scenarios so in this case i want to restrict my agent to take invalid actions on specific step. but im not sure where need to change. whether i need to write wrapper but how or choosing valid action in step function is the right way?

Feb 13 '23 18:02 Zaibali9999

invalid_actions = self.not_applicable_actions()
if action_index in invalid_actions:
    i=0
    print("action changed to: " ,action_index)
    probs=self.calculate_probibity()
    for prob in probs[0]:
        if i in invalid_actions:
            probs[0][i]=0.0
        i+=1
    
    action_index = np.argmax(probs)
    print("this next best action: " ,action_index)

Feb 13 '23 18:02 Zaibali9999

invalid_actions = self.not_applicable_actions()
if action_index in invalid_actions:
    i=0
    print("action changed to: " ,action_index)
    probs=self.calculate_probibity()
    for prob in probs[0]:
        if i in invalid_actions:
            probs[0][i]=0.0
        i+=1
    
    action_index = np.argmax(probs)
    print("this next best action: " ,action_index)

this code is in step function to take next best valid action in case of agent choses invalid one

Feb 13 '23 18:02 Zaibali9999

class MaskInvalidActions(gym.ActionWrapper):
    def __init__(self, env):
        super().__init__(env)
        self.invalid_actions

    def reset(self, **kwargs):
        
        return self.env.reset(**kwargs)

    def step(self, action):
        print("Action change")
        invalid_actions = self.env.not_actions()
        if action in invalid_actions:
            i=0
            print("action changed to: " ,action)
            probs=self.env.calculate_probibity()
            for prob in probs[0]:
                if i in invalid_actions:
                    probs[0][i]=0.0
                i+=1
            
            action = np.argmax(probs)
            print("action changed to: " ,action)

        obs, reward, done, info = self.env.step(action)

        # Update invalid actions
        self.invalid_actions = info.get('invalid_actions', self.invalid_actions)

        return obs, reward, done, info

is this a right way?

Feb 13 '23 18:02 Zaibali9999

stable-baselines3-contrib stable-baselines3-contrib copied to clipboard

Custom Environment

stable-baselines3-contrib
stable-baselines3-contrib copied to clipboard