maddpg
maddpg copied to clipboard
Two problem about update function
I have two problem about update function after read your code. And could anyone explanation it for me? I am very appreciated. Firstly, I can't understand what role does variable "num_sample" play when train q network?
# train q network
num_sample = 1
target_q = 0.0
for i in range(num_sample):
target_act_next_n = [agents[i].p_debug['target_act'](obs_next_n[i]) for i in range(self.n)]
target_q_next = self.q_debug['target_q_values'](*(obs_next_n + target_act_next_n))
target_q += rew + self.args.gamma * (1.0 - done) * target_q_next
target_q /= num_sample
Secondly, why the loss of p should be loss = pg_loss + p_reg * 1e-3
, and what role does p_reg
play in the loss.
p_reg is regularizing p_values obtained from the following:
p = p_func(p_input, int(act_pdtype_n[p_index].param_shape()[0]), scope="p_func", num_units=num_units)
.
You can notice that p_values are being used to compute the pg_loss.
I don't know yet why they have used num_sample because equalling it to 1 does not seem to do anything useful.
@kevin-y-ma @Ah31 Could you reproduce the results. Also, it seem num_sample = 1
is a bug? i think it should be num_sample = len(obs_next_n)
?
i think it's to take the expectation of the Bellman error target since you need to marginalize over next actions when evaluating the next q value
@kevin-y-ma @Ah31 Could you reproduce the results. Also, it seem
num_sample = 1
is a bug? i think it should benum_sample = len(obs_next_n)
?
Sorry, I didn't find it when I first read it. Now I find the problem. In this code, there is a variable i
with the same name.
num_sample = 1 means experience replay only 1 sample
p_reg is regularizing p_values obtained from the following:
p = p_func(p_input, int(act_pdtype_n[p_index].param_shape()[0]), scope="p_func", num_units=num_units)
. You can notice that p_values are being used to compute the pg_loss.I don't know yet why they have used num_sample because equalling it to 1 does not seem to do anything useful.
There is two variables both named i
.