Reinforcement-learning-with-tensorflow
Reinforcement-learning-with-tensorflow copied to clipboard
请问actor-critic中的critic预测价值,可以设计为预测action value分布吗?
然后取相应action的value计算v和v'