ReinforcementLearning.jl icon indicating copy to clipboard operation
ReinforcementLearning.jl copied to clipboard

`TDLearner` time step parameter

Open baedan opened this issue 3 years ago • 3 comments

TDLearner(;approximator, γ=1.0, method, n=0): the n in the constructor is strangely not the number of time steps used, but rather that number minus 1. this is really strange.

baedan avatar Jun 06 '22 13:06 baedan

;( I struggled on it too... In the end, I decided to follow TD(λ) where (λ=0). So maybe better to rename the keyword argument name?

findmyway avatar Jun 06 '22 14:06 findmyway

isn't TD(λ) separately defined in TDλReturnLearner?

the n here can just be the n as in n-step TD methods, no? it's simple enough to change, but i'm not sure how one would introduce a breaking change (though, funnily enough, the example in RLAnIntroduction.jl took the n to be the number of time steps, lol)

baedan avatar Jun 06 '22 14:06 baedan

Hmm, let me examine it again when adding it back in the next release.

findmyway avatar Jun 06 '22 14:06 findmyway