deep-rl-class icon indicating copy to clipboard operation
deep-rl-class copied to clipboard

[UPDATE] UNIT 1: the two main approaches...

Open romuvt opened this issue 1 year ago • 0 comments

When you define stochastic policies, you write:

pbm_2

\pi (a|s) = P [A|s]

LHS is a specific real number in [0,1] while on the RHS you have a probability distribution, don't you? So I think it should be something like \pi (a|s) = P [A_t = a | S_t = s]. An alternative could be to write on RHS that it is the probability of choosing action a given state s.

romuvt avatar Jul 17 '24 10:07 romuvt