MuGo Type of self.output in policy.py

Hey @brilee ,

I'm trying to play against this wonderful library. However when I'm trying genmove b I'm getting:

File "<MuGoDir>\MuGo\policy.py", line 152, in run probabilities = self.session.run(self.output, feed_dict={self.x: processed_position[None, :]})[0] AttributeError: 'PolicyNetwork' object has no attribute 'output'

What should self.output be ?

Mar 10 '18 12:03 CezCz

I found in previous commits that output used to be, but later due to log_likelihood_cost refactor got deleted.

output = tf.nn.softmax(tf.reshape(h_conv_final, [-1, go.N ** 2]) + b_conv_final)

Mar 17 '18 20:03 CezCz

Hm. Sorry about that - work on this repo is continuing at https://github.com/tensorflow/minigo. I'll update the README.md

Mar 17 '18 21:03 brilee

Hey,

It might result in such error, to fix output issue you can add this line:

self.output = tf.nn.softmax(tf.reshape(h_conv_final, [-1, go.N ** 2]) + b_conv_final) at line 88 and it will work.

However as @brilee mentioned, MuGo is no longer developed it's all switched to minigo. You may want to check out leela zero and/or lizzie(very easy configuration here).

czw., 31 maj 2018 o 10:43 JoeyQ Wu [email protected] napisał(a):

Hello, @CezCz https://github.com/CezCz @brilee https://github.com/brilee I just met the same question "AttributeError: 'PolicyNetwork' object has no attribute 'output' " , and I want to ask you about whether it can result in the error "GTP Stream was Closed" . what should I do if I want this program can run the correct result ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/brilee/MuGo/issues/38#issuecomment-393459509, or mute the thread https://github.com/notifications/unsubscribe-auth/ALPOFnDcWIVdpPk3U1aA7ZK54WC1TvZfks5t360wgaJpZM4SlQyF .

-- Cezary Czernecki

May 31 '18 09:05 CezCz

@CezCz yeah, thanks for your kind answer, actually, I fixed the line 88 with " log_likelihood_cost = tf.reduce_sum(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y))" and it could work, but I could not understand the output of mcts, why it often choose the bigger value even if it is negative ? I am confused about the result , I would appreciate if you can tell me the reason @CezCz

Jun 05 '18 07:06 JoeyQWu

Hey,

log_likelihood_cost is another problem, as now you need named logits parameter. I am happy that you managed to fix it. I am not sure what you are talking about when saying negative and bigger value? Can you provide an example?

Cezary

wt., 5 cze 2018 o 09:34 Joeyq [email protected] napisał(a):

@CezCz https://github.com/CezCz yeah, thanks for your kind answer, actually, I fixed the line 88 with " log_likelihood_cost = tf.reduce_sum(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y))" and it could work, but I could not understand the output of mcts, why it often choose the bigger value even if it is negative ? I am confused about the result , I would appreciate if you can tell me the reason @CezCz https://github.com/CezCz

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/brilee/MuGo/issues/38#issuecomment-394611350, or mute the thread https://github.com/notifications/unsubscribe-auth/ALPOFuLIha42c3GOEL8RL3KYlx3dTolTks5t5jSdgaJpZM4SlQyF .

-- Cezary Czernecki

Jun 05 '18 07:06 CezCz

just like the first picture , the location of white is R4, and I get the value is -7.5, just as the second image, the another position is Q3, and its value is 8.5, and why do the white chose the R4 rather than Q3, the latter value is greater than the former value , I am just very confused about this, perhaps I do not understand the code ,or maybe this is a silly question , but I want strongly to know the reason and I am very grateful to you @CezCz , you are a very kind person and thank you very much ！

Jun 05 '18 14:06 JoeyQWu

Hello,

Move that is to play is solely chosen based on visit count, what are they for those two nodes? I've prepared a silly image describing what might be going on: image Let's consider only depth of 2 is checked and only 1 search is being done. The state of tree in the picture is after backpropagation of this first search, I marked selection with blue pen, then backpropagation poorly with black. As you can see value network said the value of position is -0,98 (let's assume -1 is max). We can clearly see it is bad, however when the final move is chosen, only the visit count N is considered. In the end (1,1) node has the most visits therefor it is chosen.

Cezary

Jun 05 '18 16:06 CezCz

Hi , @CezCz So the next move is chosen just because the algorithm chooses the most visited move , and the value network backpropagated the visit count and the winner predicted, the positive value represents the current player wins this game , the next move is selected is not related to the value of value network, just related to the visit count , right ?

Jun 06 '18 06:06 JoeyQWu

@JoeyQWu The move that is chosen to be played in the actual game yes. Not to confuse with move chosen within selection phase - this one is chosen based on some sophisticated heuristic with exploration taken into consideration. You may want to read: https://jeffbradberry.com/posts/2015/09/intro-to-monte-carlo-tree-search/ - nice mcts introduction with examples http://www.baeldung.com/java-monte-carlo-tree-search - simple monte carlo tree search implementation https://deepmind.com/documents/119/agz_unformatted_nature.pdf - page 25-27 MCTS implementation within alphago zero (don't be confused about temperature parameter and parent visit count, these are just another parameters to promote exploration during training, but the core is visit count)

Jun 06 '18 06:06 CezCz

@CezCz okay, I will read more to understand , thank you very much, you are so nice , very grateful to you for your help!

Jun 06 '18 07:06 JoeyQWu

I also wrote http://www.moderndescartes.com/essays/deep_dive_mcts/ recently

Jun 06 '18 12:06 brilee

MuGo MuGo copied to clipboard

Type of self.output in policy.py

MuGo
MuGo copied to clipboard