Parnika

Results 3 comments of Parnika

Hello @djbitbyte! You said that gumbel softmax helps to speed up stabilize the training. I am trying to reproduce the results in pytorch and using **torch.nn.functional._gumbel_softmax_sample** while sampling the action...

Hello! Just to mention that there were many other issues in the code instead of gumbel-softmax because of which the training was not converging.

p_reg is regularizing p_values obtained from the following: `p = p_func(p_input, int(act_pdtype_n[p_index].param_shape()[0]), scope="p_func", num_units=num_units)`. You can notice that p_values are being used to compute the pg_loss. I don't know yet...