pensieve
pensieve copied to clipboard
question about the convergence problem
I have run multi_agent.py,I set the iteration number of central_agent to 10^5 and set the learning rate as you proposed.And I use tensorboard to check the curve of td_loss.But the curve couldn't converge,the range is very wide.Could you give me some guidance on convergence issue?I feel so appreciate for your help.
Did you see the reward improving in validation dataset? td_loss (policy gradient loss) has large variance can be due to many reasons; one important issue is from the stochasticity in the input process (network trace in this case). This paper provides more details about it: https://openreview.net/forum?id=Hyg1G2AqtQ, https://people.csail.mit.edu/hongzi/var-website/index.html
Thanks for reply.This paper will give me good instruction.I have read the question https://github.com/hongzimao/pensieve/issues/11.I think maybe the bad convergence performance is related to the problem that I didn't reduce the entropy weight to a small value after a certain number of iterations but keep it at 5. I will try this.I will also learn this paper.There is another question about the dataset.I want to know what's the trace dataset of the pretrain_model in the code.Can I use the synthetic trace as pretrain dataset?I know the data in dropbox link is a subset,but I find I can't access it.And I also can't download the FCC broadband dataset, Norway HSDPA bandwidth logs,Belgium 4G/LTE bandwidth logs (bonus)、homewifi dataset. Is it inaccessible now?
I think you can still download the data from that dropbox link. Synthetic data can be generated using https://github.com/hongzimao/pensieve/blob/master/sim/synthetic_traces.py