Katsuki Ohto

Results 35 issues of Katsuki Ohto

Combining #245 and #246, it seems natural to implement it like this.

…urn players There is the case that - training vs other agents - as a result, m['policy'][m['turn'][0]] is None

For training involving steps where the outputted value does not exist, it is necessary to set lambda to 1 locally. I am not sure if VTrace is correct in this.

I have never concerned `fineno()` interface. Is it useful?

Do we delete OUTCOME, or use OUTCOME as the first dimension of REWARD if it is defined?

Do we have to report not only terminal outcomes but also total rewards?

Is `prepare_env()` really necessary?

Turn-based batch creation and zero-sum averaging are different and independent. Moreover, these should be set False at default for safety.

example ``` net = SimpleConv2dModel2() optim = Adam(net.parameters(), lr=1e-4) learner = Learner(args=args, net=net, optim=optim, remote=False) learner.run() ``` This PR requires #170 (return model instance by calling net())