droiter

Results 2 issues of droiter

2018-06-22 511280 中期信用ETF 的份额为0

I think td_error in AC is same with advantage in baseline solution, which are all reward minus predicted value. One difference is AC value network is learning in TD, baseline...