reinforcement-learning-an-introduction Redundant discount factor

Redundant discount factor

Open c-lyu opened this issue 11 months ago • 1 comments

Issue Description: The reproduction code for the Gridworld environment, located here, appears to have an inconsistency regarding the implementation of the discount factor in the policy evaluation. According to Sutton's book, there is no mention of multiplying the value by a discount factor here.

Expected Behavior:

Input π
Initialize an array V(s) = 0, for all s ∈ S^+ 
Repeat
    ∆ ← 0
    For each s ∈ S:
        v ← V(s)
        V(s) ← ∑_a π(a | s) ∑_{s', r} p(s', r | s, a) [r + γ V(s')]
        ∆ ← max(∆, |v − V(s)|)
until ∆ < θ
Output V ≈ v_π

Jul 13 '23 12:07 c-lyu

reinforcement-learning-an-introduction reinforcement-learning-an-introduction copied to clipboard

Redundant discount factor

reinforcement-learning-an-introduction
reinforcement-learning-an-introduction copied to clipboard