rl
rl copied to clipboard
[Feature Request] Extend TDLambdaEstimator with QLambdaEstimator
Motivation
Attempting to implement Parallel Q Networks (online DQN without replay buffer or target networks). Uses QLambda returns.
Solution
TDLambdaEstimator expects state_value keys but we would now need action_value keys
Checklist
- [x] I have checked that there is no similar issue in the repo (required)