rl icon indicating copy to clipboard operation
rl copied to clipboard

[Feature Request] Extend TDLambdaEstimator with QLambdaEstimator

Open roger-creus opened this issue 1 year ago • 0 comments

Motivation

Attempting to implement Parallel Q Networks (online DQN without replay buffer or target networks). Uses QLambda returns.

Solution

TDLambdaEstimator expects state_value keys but we would now need action_value keys

Checklist

  • [x] I have checked that there is no similar issue in the repo (required)

roger-creus avatar Aug 13 '24 23:08 roger-creus