JORLDY
JORLDY copied to clipboard
R2D2 doesn't have reward as input ?
I could be wrong about this, but looking at the implementation, it doesn't seem like it's taking in the previous reward alongside state and prev action into the LSTM, no? Was this a design decision?