unrecall issues

Repositories
Issues
Comments

Results 2 issues of


                                            unrecall

why does the target action a' in Q(s', a') for training critic net directly come from the target actor net

Hi, jimkon In the original paper, the action for training critic net comes from the full policy. But, in your master, the action is just given by the target actor...

Misconvergence in Continuous MountainCar with actor-critic solution

when i change discount_factor from 0.95 to 1.0, the AC algorithm can't converge to a optimal policy. Besides sometimes Continuous MountainCar Actor Critic Solution works at first 30 or 40...