Parnika
Parnika
Hello @djbitbyte! You said that gumbel softmax helps to speed up stabilize the training. I am trying to reproduce the results in pytorch and using **torch.nn.functional._gumbel_softmax_sample** while sampling the action...
Hello! Just to mention that there were many other issues in the code instead of gumbel-softmax because of which the training was not converging.
p_reg is regularizing p_values obtained from the following: `p = p_func(p_input, int(act_pdtype_n[p_index].param_shape()[0]), scope="p_func", num_units=num_units)`. You can notice that p_values are being used to compute the pg_loss. I don't know yet...