various eligibility trace-equipped TD methods
as far as i can tell, only off-line λ-return is implemented (TDλReturnLearner). any interest in implementing others, such as TD(λ), n-step truncated return, true online TD(λ), and so on? i'm working through a textbook chapter on eligibility traces, and i'm happy to contribute implementations.
Hi @baedan , that would be much appreciated if you could have them implemented.
In fact, we also need contributors to work on porting tablar methods in the latest workflow in the master branch.
Since we've missed the window period to apply for the GSoC or OSPP this year, I'm considering setting up the github sponsorship under this org to raise money for the work.
In fact, we also need contributors to work on porting tablar methods in the latest workflow in the master branch.
would be great if there’s a document i can refer to for the list of intended changes in the design / a piece of example code demonstrating usage. i tried to use the new implementations for reference (QRDQN, etc) but since i’m not familiar with the underlying algorithms it’s a bit difficult haha
Good suggestion. I think the new design is kind of stable now. So I'll focus on documentation part in the next week. I'll ping you when it's ready.