stable-baselines3-contrib
stable-baselines3-contrib copied to clipboard
Custom Environment
custom wrapper around the environment that tracks when invalid actions are masked and modifies the information sent to the agent accordingly
invalid multiple actions masking on each step of an episode in PPO algorithm using stable_baselines3, in the case of 40 discrete actions and should choose next best action by implementing prediction function how i can do that
`#added rule that only applicable mutation are allowed
invalid_actions = self.not_applicable_mutations()
if action_index in invalid_actions:
i=0
print("action changed to: " ,action_index)
probs=self.calculate_probibity()
for prob in probs[0]:
if i in invalid_actions:
probs[0][i]=0.0
i+=1
action_index = np.argmax(probs)
print("action changed to: " ,action_index)
`
is this legal way to change invalid action to valid in env step function?
Hi im working on custom env that takes some invalid action in some scenarios so in this case i want to restrict my agent to take invalid actions on specific step. but im not sure where need to change. whether i need to write wrapper but how or choosing valid action in step function is the right way?
invalid_actions = self.not_applicable_actions()
if action_index in invalid_actions:
i=0
print("action changed to: " ,action_index)
probs=self.calculate_probibity()
for prob in probs[0]:
if i in invalid_actions:
probs[0][i]=0.0
i+=1
action_index = np.argmax(probs)
print("this next best action: " ,action_index)
invalid_actions = self.not_applicable_actions() if action_index in invalid_actions: i=0 print("action changed to: " ,action_index) probs=self.calculate_probibity() for prob in probs[0]: if i in invalid_actions: probs[0][i]=0.0 i+=1 action_index = np.argmax(probs) print("this next best action: " ,action_index)
this code is in step function to take next best valid action in case of agent choses invalid one
class MaskInvalidActions(gym.ActionWrapper):
def __init__(self, env):
super().__init__(env)
self.invalid_actions
def reset(self, **kwargs):
return self.env.reset(**kwargs)
def step(self, action):
print("Action change")
invalid_actions = self.env.not_actions()
if action in invalid_actions:
i=0
print("action changed to: " ,action)
probs=self.env.calculate_probibity()
for prob in probs[0]:
if i in invalid_actions:
probs[0][i]=0.0
i+=1
action = np.argmax(probs)
print("action changed to: " ,action)
obs, reward, done, info = self.env.step(action)
# Update invalid actions
self.invalid_actions = info.get('invalid_actions', self.invalid_actions)
return obs, reward, done, info
is this a right way?