JunjieChen

Results 1 issues of JunjieChen

As show in the `nnagent.py`, the author use average return of a batch as the loss function. However, it seems that such loss function only contains instantaneous reward, not average...