Deep-Reinforcement-Learning-Practice icon indicating copy to clipboard operation
Deep-Reinforcement-Learning-Practice copied to clipboard

PPO随机策略

Open davinca opened this issue 5 years ago • 0 comments

请问对于连续控制任务,如果可选的动作action有多个(假设6个),PPO采用随机策略其actor最后一层的输出是什么?

davinca avatar Jun 06 '19 15:06 davinca