[Feature Request] Extend TDLambdaEstimator with QLambdaEstimator

Open roger-creus opened this issue 1 year ago • 0 comments

Motivation

Attempting to implement Parallel Q Networks (online DQN without replay buffer or target networks). Uses QLambda returns.

TDLambdaEstimator expects state_value keys but we would now need action_value keys

Aug 13 '24 23:08 roger-creus