Practical_RL icon indicating copy to clipboard operation
Practical_RL copied to clipboard

Equation of state-action value function in seminar_vi week 02

Open AI-Ahmed opened this issue 2 years ago • 0 comments

Hello there, First, I have to thank you so much for providing us with such an amazing curriculum. Second, I want to assign a note here that took me some time to search about it. Now, first I want to clarify some points regards value function.

  • Value Function: "How good is" specific action or specific state for your agent. (Deeplizard: Reinforcement Learning - Developing Intelligent Agents, Prof. Steve Brunton: Model-Based Reinforcement Learning)

  • The value (utility) of a state s: V*(s) = expected utility starting in s and acting optimally while The value (utility) of a q-state (s, a): Q*(s, a) = expected utility starting out having taken action a from state s and (thereafter) acting optimally (Prof. Pieter Abbeel, lecture 08, CS188 Artificial Intelligence UC Berkeley, Spring 2013, slide 22).

  • When we talk about Value Iteration, there are two value functions to that; (Deeplizard: Reinforcement Learning - Developing Intelligent Agents, John Schulman: Markov Decision Processes and Solving Finite Problems, slide 11, Prof. Pieter Abbeel: lecture 08, CS188 Artificial Intelligence UC Berkeley, Spring 2013, slide 50).

    1. state-value function: $v_\pi(s) = E(G_t | s_t=s]$ <---- It gives the value of a "state" under $\pi$.
    2. state-action value function: $Q_\pi(s, a) = E(G_t | S_t=s, A_t=a)$ <---- How is is good for an agent to take any given action ($a$) from a given state ($s$) while following the policy ($\pi$).

Therefore, we have 2 value iteration formulas (state-value, and state-action value). I didn't see a mix between them before honestly as I saw in the seminar_vi.ipynb. How would you expect me to get get_action_value without having a 2d list of actions and states by invoking $V(s')$ into the equation?

So, if I want to get the Value Iteration in the case of the seminar_vi.ipynb, the equation should look like that; $V_i(s) = \sum_s' P(s' |s,a) \dot [r_{i+1} + \gamma V_i(s')$ instead of $Q_i(s, a) = \sum_{s'} P(s' | s,a) \cdot [ r(s,a,s') + \gamma V_{i}(s')]$

and if I have state-action-table (values) (a.k.a Q-table), I can use state-value function to calculate it. $$Q_i(s, a) = \sum_{s', a'} P(s' | s,a) \cdot [ r(s,a,s') + \gamma Q_{i}(s', a')]$$

Please, If I'm wrong correct me, I would be more than happy to hear your thoughts.

AI-Ahmed avatar Jun 12 '22 19:06 AI-Ahmed