MyAlphaGoZeroOnConnect4 icon indicating copy to clipboard operation
MyAlphaGoZeroOnConnect4 copied to clipboard

Question about the network used

Open lijas opened this issue 6 years ago • 3 comments

Hello

Can you tell me what about the model your using. How many conv-layers, residual layers etc.

lijas avatar Mar 03 '19 16:03 lijas

If I remember correctly, the network has 1 input layer and followed by 5 mid layers. Each of these mid layer uses both convolution and residual technique. Then, the network is splitted into 2 branches - value head and policy head.

yichen914 avatar Mar 03 '19 23:03 yichen914

Ok thank you. And a question about the tempeture-parameter. For how many moves do you use tau=1 before changing to tau=0? Is the tempeture only applied during self play, or also when playing real games?

lijas avatar Mar 04 '19 06:03 lijas

Re: how many moves before changing tau from 1 to 0, I think it is based on your experience. In my code, I set it to 10 steps. The temperature is for balancing the exploration and exploitation. When we train network (self play) we need the network to do enough exploration before it dives into a certain path. But when we compare or test the network, we assume that the path the network takes is the "best" (so far), so we don't need to set the temperature to 1.

yichen914 avatar Mar 04 '19 22:03 yichen914