HandyRL
HandyRL copied to clipboard
feature: apply lambda=1 in the timestep that there is no value output
For training involving steps where the outputted value does not exist, it is necessary to set lambda to 1 locally.
I am not sure if VTrace is correct in this.
This is ugly and complicated, I think...
VTrace looks working well in experiments with Geister.