Machine-Learning-and-Data-Science
Machine-Learning-and-Data-Science copied to clipboard
calculating the state value function from state action value function
Hi, I'm a little bit confused why you just take the q value of the best action and set this as state value function. According to the relationships between v and q the averaged q values over the actions according to the policy should be the value of the state value function. Best regards