Practical_RL Equation of state-action value function in seminar

Equation of state-action value function in seminar_vi week 02

Open AI-Ahmed opened this issue 2 years ago • 0 comments

Hello there, First, I have to thank you so much for providing us with such an amazing curriculum. Second, I want to assign a note here that took me some time to search about it. Now, first I want to clarify some points regards value function.

Value Function: "How good is" specific action or specific state for your agent. (Deeplizard: Reinforcement Learning - Developing Intelligent Agents, Prof. Steve Brunton: Model-Based Reinforcement Learning)
The value (utility) of a state s: V*(s) = expected utility starting in s and acting optimally while The value (utility) of a q-state (s, a): Q*(s, a) = expected utility starting out having taken action a from state s and (thereafter) acting optimally (Prof. Pieter Abbeel, lecture 08, CS188 Artificial Intelligence UC Berkeley, Spring 2013, slide 22).
When we talk about Value Iteration, there are two value functions to that; (Deeplizard: Reinforcement Learning - Developing Intelligent Agents, John Schulman: Markov Decision Processes and Solving Finite Problems, slide 11, Prof. Pieter Abbeel: lecture 08, CS188 Artificial Intelligence UC Berkeley, Spring 2013, slide 50).
1. state-value function: $v_\pi(s) = E(G_t | s_t=s]$ <---- It gives the value of a "state" under $\pi$.
2. state-action value function: $Q_\pi(s, a) = E(G_t | S_t=s, A_t=a)$ <---- How is is good for an agent to take any given action ($a$) from a given state ($s$) while following the policy ($\pi$).

Therefore, we have 2 value iteration formulas (state-value, and state-action value). I didn't see a mix between them before honestly as I saw in the seminar_vi.ipynb. How would you expect me to get get_action_value without having a 2d list of actions and states by invoking $V(s')$ into the equation?

So, if I want to get the Value Iteration in the case of the seminar_vi.ipynb, the equation should look like that; $V_i(s) = \sum_s' P(s' |s,a) \dot [r_{i+1} + \gamma V_i(s')$ instead of $Q_i(s, a) = \sum_{s'} P(s' | s,a) \cdot [ r(s,a,s') + \gamma V_{i}(s')]$

and if I have state-action-table (values) (a.k.a Q-table), I can use state-value function to calculate it. $$Q_i(s, a) = \sum_{s', a'} P(s' | s,a) \cdot [ r(s,a,s') + \gamma Q_{i}(s', a')]$$

Please, If I'm wrong correct me, I would be more than happy to hear your thoughts.

Jun 12 '22 19:06 AI-Ahmed

Practical_RL Practical_RL copied to clipboard

Equation of state-action value function in seminar_vi week 02

Practical_RL
Practical_RL copied to clipboard