agents icon indicating copy to clipboard operation
agents copied to clipboard

action constraints in PPO

Open niklasnolte opened this issue 4 years ago • 4 comments

Hi, what is the best way to implement action constraints in a PPOAgent? For a QPolicy i can use observation_and_action_constraint_splitter. Is there something equivalent for ppo policies?

niklasnolte avatar Mar 01 '20 18:03 niklasnolte

For an environment that has discrete actions you could do a similar pattern. In the case of continuous actions it gets a bit more tricky, you could use truncated normals for the distribution for example.

oars avatar Mar 02 '20 16:03 oars

#216 - for continuous actions

kuanghuei avatar Mar 02 '20 22:03 kuanghuei

i have a discrete pattern case. but i was wondering about the technical part, meaning is there a feature that implements the observation_and_action_constraint_splitter functionality for PPOAgents? Given that you added the label, i guess not yet.

niklasnolte avatar Mar 07 '20 17:03 niklasnolte

Hello @niklasnolte, I want to train my PPO agent on an custom environment with discrete action space, while its performance will be compared with DQN agent. Have you figured out how to apply action constraints on PPO agent? Thanks a lot!

edit: For anyone who encountered the same problem, I found a possible solution in #452.

JasonHuang2000 avatar Apr 03 '22 09:04 JasonHuang2000