JunjieChen
Results
1
issues of
JunjieChen
As show in the `nnagent.py`, the author use average return of a batch as the loss function. However, it seems that such loss function only contains instantaneous reward, not average...