Deep-Reinforcement-Learning-Practice
Deep-Reinforcement-Learning-Practice copied to clipboard

Published 20 hours ago •

Reame
Issues

PPO随机策略

Open davinca opened this issue 5 years ago • 0 comments

请问对于连续控制任务，如果可选的动作action有多个（假设6个），PPO采用随机策略其actor最后一层的输出是什么？

Jun 06 '19 15:06 davinca