RES
RES copied to clipboard
what is the estimated values and estimate true values
"Figure 4(b) shows the estimated and true values of RE-QMIX (with best λ). The estimated values are computed by averaging over 100 states sampled from the replay buffer at each timestep, and we estimate true values by averaging the discounted returns which are obtained by following the greedy policy with respect to the current Qtot starting from the sampled states. "
图4b中真实值和估计值具体是怎么绘制的,使用了代码里的哪些变量? 图4a中的估计值和图4b中的估计值我理解应该是一样的,只是没有取log,但是图4b中的真实值不知道是怎么画出来的,能否给出具体的实现方法。