Double-Deep-Q-Learning-for-Resource-Allocation icon indicating copy to clipboard operation
Double-Deep-Q-Learning-for-Resource-Allocation copied to clipboard

About co-ordinations

Open zhiwei-roy-0803 opened this issue 4 years ago • 2 comments

Hi, thanks for your awesome reproduction! As mentioned in the paper, it utilizes a so-called co-ordinations to stabilize the training of the agent. However, from the training process, it seems no such procedure was introduced. Also, I have mentioned that in the testing phase, the some agents take actions without affecting the environment, is it the so-called co-ordinations? If so, the co-ordinations occurs in the testing phase, and how can it be helpful for the agent to avoid collision? Looking forward for your kindly reply.

zhiwei-roy-0803 avatar Aug 26 '20 12:08 zhiwei-roy-0803

Hey, I am not been able to understand the question properly. If possible can you please share the snippet of the code where you have doubts? If doubt is not in the code then can you please share the part of the article which causes this confusion?

Engineer1999 avatar Aug 27 '20 04:08 Engineer1999

Hi, these codes are quite confusing to me: for game_idx in range(number_of_game): self.env.new_random_game(self.num_vehicle) test_sample = 200 Rate_list = [] print('test game idx:', game_idx) for k in range(test_sample): for i in range(len(self.env.vehicles)): self.action_all_with_power[i,:,0] = -1 sorted_idx = np.argsort(self.env.individual_time_limit[i,:]) for j in sorted_idx: state_old = self.get_state([i,j]) action = self.predict(state_old, self.step, True) self.merge_action([i, j], action) if i % (len(self.env.vehicles)/10) == 1: action_temp = self.action_all_with_power.copy() reward, percent = self.env.act_asyn(action_temp) #self.action_all) Rate_list.append(np.sum(reward)) It seems that in the testing phase, some links (agent) take action without observing the next states given by the environment immediately. Is it the so-called co-ordinations? If so, why they are happening in the testing phase instead of the training phase? From the name of the function, "act_asyn", it seems to be the "asynchronously update" mentioned in the last paragraph of the Section III in this paper. In that paragraph, the aim of the technique is to make the agents learn to cooperate, but in the training phase, we use the function "act_for_training" instead of "act_asyn". That does not make sense to me. How about you?

zhiwei-roy-0803 avatar Aug 27 '20 08:08 zhiwei-roy-0803