pensieve Question about training time

Hi, How long does the training process take? I am running the tensorflow cpu version on an i7-4720HQ CPU @ 2.60GHz. The training has been running for a couple of hours now..

~Sandip

May 12 '18 15:05 sandman

@sandman what's the size of your training data?

May 17 '18 13:05 Dreamer-hxs

@Dreamer-hxs I am using all the available training traces from Hongzimao's original link (127 files in total).

May 17 '18 20:05 sandman

@sandman in fact, the author said in his paper "Training a single algorithm required approximately 50,000 iterations, where each iteration took 300 ms and corresponded to 16 agents updating their parameters in parallel ". I trained with 127 files myself, and found it took around 240 ms each iteration(300000 iterations --> about 25 hours). It seems that the data size has little impact on the time/iteration, i don't know the training size the author use. I don't know if the overfitting problem will exist here, I just take it for grant that more iterations, more accuracy.

May 18 '18 09:05 Dreamer-hxs

The training time is roughly in the same scale as @Dreamer-hxs's calculation. Two caveats: 1. The paper uses more data than the ones in repo. The small dataset is meant for others to quickly reproduce the first order result and get the sense of learning approach. 2. Through the training the entropy needs to be decayed. You can see explicitly how decaying entropy improves the policy, as we did not automate this process.

May 19 '18 02:05 hongzimao

How can we decay the entropy during the training? By simply manually adjusting the paramater in a3c.py over time?

May 31 '18 08:05 hudson-ayers

Yes, and remember to load the previous trained model to bootstrap.

May 31 '18 14:05 hongzimao

We are attempting to train on a different video, so we cannot load the previous model. Do we have to stop the training in order to adjust the paramater in a3c.py, or is that file reloaded over time?

Jun 01 '18 23:06 hudson-ayers

If your different video has different number of bitrates, the previous model wouldn't fit. To train a single model that works across multiple videos, please refer to the paper section 4.3 for multi-video training. The code for doing it is in https://github.com/hongzimao/pensieve/tree/master/multi_video_sim.

Jun 02 '18 19:06 hongzimao

For training using the original video, I am still a little unclear on how to adjust the parameter for entropy. Is simply modifying the ENTROPY_WEIGHT parameter while the training is running sufficient, or do I need to stop the training, modify the parameter, and then somehow resume the training?

Jun 02 '18 20:06 hudson-ayers

The latter one. And "somehow resume the training", you can modify https://github.com/hongzimao/pensieve/blob/master/sim/multi_agent.py#L34-L35

Jun 03 '18 16:06 hongzimao

Hi, I have a query What do we call an iteration here? is it one pass over network traces, one pass(download) over 48 chunks of video or one chunk download. Thanks

Dec 07 '18 07:12 jlabhishek

One pass over 48 chunks. You can check env.py for the condition when the episode ends.

Dec 09 '18 18:12 hongzimao

Hi, I have two questions. 1、Does a loop in function central_agent(in the multi_agent.py) mean 16 iteration or 1 iteration? 2、Is it feasible to achieve automatic adjustment of entropy weight, such as entropy_weight = original_weight-decay_factor*iteration? Thanks!

Nov 21 '19 07:11 YihuaZou

Thanks for your questions. 1. I think it meant 1 iteration. 2. Linear decay of entropy looks good, it's what we've been doing for lateral projects too. Example: https://github.com/hongzimao/decima-sim/blob/master/train.py#L397-L398 and https://github.com/hongzimao/decima-sim/blob/master/utils.py#L39-L44

Nov 21 '19 16:11 hongzimao

Hi，I have tried the linear decay of entropy weight(3 for the first iteration and linearly decay to 0.01 after 1e5 iterations). After 120000 iterations，I have got a result better than mpc, but it still worst than the model you provided. So I want to know, is this because the training data is not enough（I download the traces you provided in dropbox, and I notice that it is a subset.）, or the training is not enough, or is it other reasons?

Nov 28 '19 12:11 YihuaZou

More training data can help improve the performance. Our paper used a larger dataset to train the agent. This repo initially used a smaller subset of data for others to quickly generate results. You can follow the instructions in https://github.com/hongzimao/pensieve/blob/master/traces/README.md to generate more data (even more data than we used in the paper if you want). However, just using the data provided in dropbox should give you a model comparable with what we provided. Many others have reproduced, or even surpassed, our results just with the existing data.

Nov 28 '19 16:11 hongzimao

@hongzimao Thanks for replying.Your reply helps me a lot and I will retry it with different parameter. But I have another question. How to judge that the network has converged? Same as #76, I find that td_loss (policy gradient loss) has large variance, so it is difficult to judge convergence by this parameter. Maybe check the test results?

Dec 01 '19 13:12 YihuaZou

Checking validation reward is a sensible way to check convergence. We also find entropy (you can normalize it with -log(n), where n is the number of actions) a good metric for checking convergence. Hope this helps!

Dec 04 '19 03:12 hongzimao

pensieve pensieve copied to clipboard

Question about training time

pensieve
pensieve copied to clipboard