calculating the state value function from state action value function

Open Fjoelsak opened this issue 2 years ago • 0 comments

Hi, I'm a little bit confused why you just take the q value of the best action and set this as state value function. According to the relationships between v and q the averaged q values over the actions according to the policy should be the value of the state value function. Best regards

Jun 02 '23 14:06 Fjoelsak