Machine-Learning-and-Data-Science icon indicating copy to clipboard operation
Machine-Learning-and-Data-Science copied to clipboard

calculating the state value function from state action value function

Open Fjoelsak opened this issue 2 years ago • 0 comments

Hi, I'm a little bit confused why you just take the q value of the best action and set this as state value function. According to the relationships between v and q the averaged q values over the actions according to the policy should be the value of the state value function. Best regards

Fjoelsak avatar Jun 02 '23 14:06 Fjoelsak