unrecall

Results 2 issues of unrecall

Hi, jimkon In the original paper, the action for training critic net comes from the full policy. But, in your master, the action is just given by the target actor...

when i change discount_factor from 0.95 to 1.0, the AC algorithm can't converge to a optimal policy. Besides sometimes Continuous MountainCar Actor Critic Solution works at first 30 or 40...