Katsuki Ohto comments

Results 26 comments of


                                            Katsuki Ohto

How do i know when ive reached an optimum while training

An opponent player in the evaluation phase is a random player in default. I think the winning rate of a perfect player versus a random player is about 98% in...

How do i know when ive reached an optimum while training

Thanks for your suggestion. Selecting opponents is what we are considering right now. Do you have any good idea to specify the old model in configuration? By the way, comparing...

Make it a library

@kenoss Thank you for your nice suggestion! We have been also considering making HandyRL a library, while I'm worried problems about compatibility. Currently, HandyRL is shared as a **tool** which...

num_parallel affecting learning results

Thanks for your report! We ran several experiments with 64 workers, and all the training was successful. However, it is not easy to learn non-legal moves in this task, and...

(To be discussed) (Idea) feature: multi dimensional reward

Yes, `gamm: [1, 0.99]` will work. Warnings for over length need to be added.

Help please

Actually, my sample is also failing to achieve optimal policy. I think MuZero might need many self-play games in the early stage to obtain good abstract transition model. I saw...

network structure and training

Hi, Constructing one big net is not necessary in this case. You can tell an optimizer the whole parameters to train. Following code will work. --- rep, dyn, pre =...

Recent progress

Hi, These days I'm not touching Muzero code. I would be appreciate if you find new key points to archirve good result. After other RL experiments, I found that ReLU...

Recent progress

Hi, @ZHANGRUI666 I found careless bug in tree search method. Encoded abstract state had not been updated when descending search tree. Unbelievable! After I fixed it, training looks going well.

Support cuda

That's a good idea. However, I think this sample code is not enough to solve practical problem. We should parallelize episode generation and use many CPUs.