drl-5g-scheduler icon indicating copy to clipboard operation
drl-5g-scheduler copied to clipboard

small command issue

Open hanghoo opened this issue 1 year ago • 10 comments

Hi there! I'm trying to reproduce your code and find a small issue in the off-line training setup. Hope it's helpful. PYTHONPATH=./ python3 ./sim_script_example/ka.py instead of PYTHONPATH=./ python3 ./sim_sript_example/ka.py

hanghoo avatar Apr 21 '23 20:04 hanghoo

By the way, I feel it's a little confusing when the thread is involved, especially asynchronization=False. Do you have any suggestion to debug the program by using breakpoints? Thank you very much.

hanghoo avatar Apr 28 '23 15:04 hanghoo

Hi Hanghoo, the thread (or setting asynchronization = True) is only useful in online experiments. When set to false in the offline experiment, the algorithm will behave sequentially as "generate one transition, save the transition, update one training step, generate one sample, ... " repeat, which is achieved by mutex locks. (this follows the algorithm flow in our paper)

zhouyou-gu avatar Apr 29 '23 08:04 zhouyou-gu

Hi Zhouyou, thank you very much for your response. Yes, I found the mutex locks of step() and sample(). May I know what the function of _per_w_multiplier() is? Also, if you can give me some suggestions about Multi-head critic implementation, that would be very helpful. Thank you very much.

hanghoo avatar Apr 29 '23 16:04 hanghoo

Hi hanghoo, _per_w_multiplier() is a function to adjust the weight of each sample (or each transition) according to the delay of each user's queue. This is used to achieve the importance sample. You can find the math expressions in our paper. I am not sure whether queue delay is a state feature in your applications. It can be configured according to state features in your applications. Also, for the multi-head critic, you can view it as several critics in parallel, where each user has one critic. You can find the expression in our paper as well.

zhouyou-gu avatar Apr 30 '23 09:04 zhouyou-gu

Hi Zhouyou, thank you very much for your detailed explanation. That's very helpful. I have read your paper. On the other hand, I guess the weights in this line l_critic = torch.mul(l_critic_per_batch, weights) is the current weight of importance sampling. _per_w_multiplier() is used for adjusting the weight of importance sampling for the following batch calculation, right? Thank you.

hanghoo avatar May 03 '23 15:05 hanghoo

Yes, that's correct. Or, more precisely, weights in l_critic = torch.mul(l_critic_per_batch, weights) and l_actor = torch.mul(l_actor_per_batch, weights) is the correction of bias caused by importance sampling. ret_per_e = to_numpy(l_critic); ret_per_e = ret_per_e * self._per_w_multiplier(batch) sets the weight of each transition for the following iterations. Details can be found in our paper. The terms (or variable names) may not be well-linked between codes and the paper.

zhouyou-gu avatar May 05 '23 16:05 zhouyou-gu

Thank you very much for your answers.

  1. Do you think the multi-head critic architecture can transplant to the Auto entropy SAC? If you can share any references about multi-head critic, that will be very helpful.
  2. Compared with class DDPG and class MultiHeadCriticDDPG_NEW_PER, the difference is whether considering the _per_w_multiplier. Therefore, regarding the normal replay memory, is there any difference between single and multiple heads? Thank you very much.

hanghoo avatar May 05 '23 19:05 hanghoo

Hi, hanghoo. For 1, I did not use SAC before, so I do not know about it. For 2, no difference.

zhouyou-gu avatar May 09 '23 08:05 zhouyou-gu

Hi @zhouyou-gu, thank you very much for all your help.

hanghoo avatar May 09 '23 19:05 hanghoo

Hi @zhouyou-gu, a quick question. Is there any reference to support multi-head critic architecture?

hanghoo avatar May 11 '23 15:05 hanghoo