Katsuki Ohto

Results 26 comments of Katsuki Ohto

An opponent player in the evaluation phase is a random player in default. I think the winning rate of a perfect player versus a random player is about 98% in...

Thanks for your suggestion. Selecting opponents is what we are considering right now. Do you have any good idea to specify the old model in configuration? By the way, comparing...

@kenoss Thank you for your nice suggestion! We have been also considering making HandyRL a library, while I'm worried problems about compatibility. Currently, HandyRL is shared as a **tool** which...

Thanks for your report! We ran several experiments with 64 workers, and all the training was successful. However, it is not easy to learn non-legal moves in this task, and...

Yes, `gamm: [1, 0.99]` will work. Warnings for over length need to be added.

Actually, my sample is also failing to achieve optimal policy. I think MuZero might need many self-play games in the early stage to obtain good abstract transition model. I saw...

Hi, Constructing one big net is not necessary in this case. You can tell an optimizer the whole parameters to train. Following code will work. --- rep, dyn, pre =...

Hi, These days I'm not touching Muzero code. I would be appreciate if you find new key points to archirve good result. After other RL experiments, I found that ReLU...

Hi, @ZHANGRUI666 I found careless bug in tree search method. Encoded abstract state had not been updated when descending search tree. Unbelievable! After I fixed it, training looks going well.

That's a good idea. However, I think this sample code is not enough to solve practical problem. We should parallelize episode generation and use many CPUs.